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ABSTRACT 

The preliminary draft of a formal program evaluation 
model for special education operations in Pennsylvania is presented^ 
Beginning chapters provide a review of literature on norm-referenced 
measurement; an illustration of formal and informal program 
evaluation in* a learning disabilities context, descriptions of 
general implementation strategies of norm-referenced measurement, and 
analyses of the relationships of norm-referenced measurement to 
existing statewide and' national assessment schemes. Subsequent 
chapters include a feview of literature on the use of 
criterion-referenced measurement in formal program evaluation, a 
description of a criteiion-Aaferenced measurement system said to be..; 
suitable for, special state-connected projects stach as the National 
Regional Resources Center of Pennsylvania, and an explanation of 
machinery needed for ifcplemei^\tation of the formal program evaluation 
system for special education kt the ; jstate level (personnel and data 
Ranking activities) • Appendixes suggest-priorities in the 
dissemination of the draft dodjuaent, provide guidelines for 
professional usage of accountability data at Ideal or state levels 
with either total program evaluation or individual achievement 
monitoring, analyze possible interrelationships- among existing 
agencies in carrying out a statewide formal program evaluation 
system, and outline .operational steps needed to implement a statewide 
formal program evaluation systek in its first year. (G») 
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NOTE 

The construction of this model by PRISE Represents one of many 

* 

separate but coordinated efforts undertaken by Dr. Ohrtman, Director of 

* * - * 

the Bureau of Special Education, Commpnwealth of Pennsylvania, Harrisburg. 

Another evaluation-oriented activity that concerns special education in 

Pennsylvania is the series of discussions held by the Subcommittee on 

Evaluation, chaired by Dr. Ricfiard -K, Meyers, Special Education Department, 

Slippery Rock' State College; this Subcommittee in turn is part of the 

Stated Advisory Board for Special Education. 
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• • * PREFACE 

The preliminary drafrof a formal prfitjram evaluation model for special 
edi/cation operations in tfie Commonwealth of Pennsylvania is presented here- 
In. As staged in' the introduction, it should be remembered that this model \ 
Is only a tentative one, qyite o^en to change. The major purpose^ this 
document is to providfetf foundation for discussion. # Hany changes are'fcn- 
tieipate<k .One should t&ke particular notice of the fact' that this model 
deals v/tth only formal evaluation of programs in terms of commonly recogr 
nized measuring instruments; at no point y/as subjective accredi tat ion-tjpi- 
program* evaluation brought Jnto the model; ^ •* • 

.Finally, a few words are in order regarding Kow thfsjnodel was brought ► 

into being. The model has been constructed wlth&ut special funds of any • 

\ 

type under the regular auspices of PRISE, the Pennsylvania Resources and 4 , 

• •* * . 

Infdrmation Center for Special Education. , PRISE^is funded under Title HIJ. 

v of the. Elementary and Secondary Education Act-of 19^5,'and Is located in King 

of'Prussia with RRC, the Regional Resources Center of^Efstery Pennsylvania , 

. . " 4 * 

"for Special Education'. Part of the regular functions of PRISE inf Ij^de % , w 

\ \ * i 

serving^directly the members of the Bureau of Special Education in Harrisbgrg. 

• of * N 

Thus', Dr. Wiljiam T* Ohrtmar\, Director of the Bureau of SpertaL Education f - - 

■i « * * * 

cam's" to^RISE around February, 1970, and asked that* It's personnel beg in . putting- 

-'.*-*. - ' 
together a formal program evaluation model for consideration by the Bureau. 

This task was assigned to Dr. Barton 6. P'roger", Director of Evaluation and ^ '. 

"'Dissemination for PRISE. l£ should also be reaHzed that^ this model had to 

be constructed amidst the several other 4 regular activities of RRC and PRISE.- 

Thus, , the previous sunnier* (1970) ami the present academic year (1970-1970 

had been allotted as sufficient working time to provide the model. 

•• ' 

'*-s 4 Robert L. Kalapos ' ' -9' 

"> . > Director, RRC * * w 

rt ■ ~ . _ 

«• - I L. j > m 
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\ > .} INTRODUCTION ; QUALIFICATIONS 
< ~ 

OF THE MODEL AND ANTICIPATION OF CRITICISMS ^ " . > 

-•* - .- - ' • » 

Dr'. Willianfl*. Ohrtman, Director of the Bureau., of Special Education, _ . 
fhas be^n long interested .in formal evaluation .of programs. It .is important;-? 
at theoittset to distinguish between informal ,, subject iVe, peered ita.t .ion- ■ 
<ty^^ogram evaluation (which r^cur^ent4y-beLngL|^i^out"by- the Bureau)' ... 
and the more fprmalized, objective, program evaluation thtt re l i eVomxaflv^^ 
'monly recognized measuring devices. Further^ -individual- pupiT, psycholog- ; 
«ca1 ev'aluatfons-of children are not to be confused with^elther accredit^ 
Nation program evaluation or formal, program evaluation. Because the imple- 
mentation of formal program evaluation has been almost totally lacking in . 
state specif education operations across the nation fc little help exists , 
in the form of guidelines which might aid the Bureau in considering what - . 
theV'relative advantages and disadvantages of formal program evaluation are. 
Thus, Dr. Ohftman assigned me (a) to gather together a state-of-the-art 
paper on formal program evaluation," and (b) to develop a model for formal ' 
program evaluation that the state might consider. V ' 

It should be establishedjmmediately in the minds of the readers that 
| have a bias* toward fprmal measurement procedures. I believe the sfngle 
.most important leant of the success of any special education program is 
how well ch'/dren have progressed, over specif led periods' 'of time in highly 
' 'specific areas of bAovIor. To me, any .program evaluation made in terms 
of one point in time (such as the pnee-a-year, standardized testing programs ' 
giver, in spring in regular education) .offers very little in data for Judg- . 
ing the effectiveness of programs. The once-a-year .testing merely tel Is ■ 
school' personnel where the child is, .with no Indication of how school itself 

'. ' '.' 10 \ •"■ 



affected those test results. One must at least have baselines against 
which to- measure progrfess. . Thus, f.have endeavored to embody the best 
measurement methodology -in the model developed here. 

; Jhe.sta^of-the-art oaper on formal 'p/og ram evaluation is derived 
in>art.from an articles accountability that^was requested of me by. the 
Journaj £f.'ilgrn ing Disabilities . The other component of the evaluatipn 
package presented heVe.the 'individual achievement mbnftoring system, has 
been devised* by Dr. Lester Mann and f for the Eiste^n SuBurban Division of 
the ^tionai\egi,onal Resources Center- of Pennsylvania. The monitoring 
system- has- recei.vecTpubl ici ty at the Council for Exceptional ChMdj^n con- : 
venkion In- Miami j n 1971 (Man* andProger, 1971) and in the Journal of 
Special Educatio n foahr*; in press).* -The individual achievement, mon'i tor ing 
system is mentioned in the' context of thJTTT©del because' i t- has many implt- 
xafions for any formal program evaluation system. The monitoring system 
described here e'mbocTies a great many new measurement concepts taken from. 

< 1 

^criteridn- referenced, measurement theory. % 



The whole noti*an of formal program 



ev&luatiorr fs related 6 to the con- 
, accountability has been v}ev/ed 



cept of ''accountability*' 1 Unfortunately], 
with a type of funnel vision as being associated only with guaranteed per- 
formance contracts between pu bl ishers* of educational "materials and school 
systems. Such contracts assume various fprms; some merely provide mater i<als" , 
only, with no insurance or guarantee that certain minimum acn ievement m\ 11 
be produced irf the children who use them; others provide a far-ranging 
package of not only materials, but atso personnel , 'consul tat icr^T guarantees , 
etc. However, no matter what arrangement is reached between the publisher 

0 4 

and the shcool system, there is an implied or stated assumption that "ac^ & 
countability" will prevail. Stated very .simply, the term refers to the 



attempt to-evaluate how well certain educational objectives werTTch i eved . 
f Another term which is receiving increased usage In the educatiorfal Hter- 
ature is "program evaluation 1 '; indeed, one might - cons ider this term synony- 
mous- with accountability. • , ' 

The sad part of viewing accountability or' formal program evaluation ' 
In connection with only p^erformapce contracting, is that the vast majority 
of routinely funded, locally run special education programs go without any 
formal evaluation. - the view taken in this document 1y hat formal program 
evaluation should extend to all types -of specia} education 'programs. « * 

As one reads, thVough this model, he will see that a certain amount of 

\ * * 

research design has been tfirWn into the total picture, \0bviously, any. * 
data obtained in realistic, onVgoing specia 1 f educat ion programs wj 1 1 not 
be as methodologically "clean" as, desired from a research point of view.O 
there wi 1 1 -always be the criticisms from skeptics that feel contaminate 
data should not be used at all; it is precisely this negative view that has 
kept formal program 'evaluat ion -from. ever ..reaching fruition. Nonetheless', 
whenever. a formal program evaluation system Is attempted for' special- educa- 
tion, officials must realize that such criticisms will continuously be made. 
Besides the formal program evaluation system described herein and the. 

individuaTachievement monitoring system, the reader will s-ee.fn ihe ap- * 

% ** * ? ^ • * 

pendix to this report that pa>t of the Machinery 11 needed at the state 

level to make such evaluation systems work Js a data-backing activity. A 

large data processing .and data storage arm*i s needed; " thefce facilities are* 

usually already available at local and stat? f»vels. Arrangements would 

'■ . . • - - * . 

haye to.be made- with existing staff to -work out a base of cooperation, -fn 

• f ' '* . .if' 

connection with such data-banking activities, th*ere will na doubt^be cries 

of ."invasion of privacy" and "everything our children has ever done has 

been reduced to a mass of numbers on computer jta^ds ." » True, such dangers 



are always present, but it Is also felt that when and If the Bureau of 
^JtftC&il Education finally feels it has the necessary sophistication in the 
model so that it would want to recommend it, then safeguards on interpre- 
tation will have to be implemented. 

With regard to interpretation and use of accountability or program 
evaluation data, gross misconceptions about accountability in general have 
given rise to unwarranted criticisms of systems for •ogram evalu- 

. ation systems. Accountability has been twisted to me*., that unflattering 
" risu Irs; iTT^TTrg+ven-progrW might even ca^se a teacher or administrator 
who .has* been associated with that program to \e reprimanded, to lose pay, or 
even to be fired.^Thfs view could not be farther from the truth. If I were 
interpreting data obtained throughout the Commonwealth from on-goin^'sj^cial 
education programs, first, at the local level, rto teacher within a school 
. organization would be compared favorably or unfavorably with another teacher. 
Such comparisons are NOT the purpose of formal program evaluation, although 
many misguided- people have attempted to convey this threatening image. As 
.1 see it, the^iwjor goaf of accou o^ bi 1 Uy is to look at the overall pro- 
gress of childTen within one major programming approach (if only one approach 
is Used) ort&> compare one programming technique with one or more others, 
(if^more than one approach- is used with the same children).. Second, at the 
state level of data-banking, a school's (or I.U.'s) program for, say, the 
trainable, in ope part qf the state will NOT be compared favorably or un- 
. favorably with a similar program^ in another part of the state. This is not 
meant to be a vehicle for approving or disapproving on-going programs, f ■ 
for supplementing pr detracting from federal aid to such programs.^ Ifmust 
be remembered that the number of confounding variables in making such threat- 
" en'ing comparisons is far top great to allow valid comparisons of that type. 
* The main point is that only programming .techniques as such (not an individual 

13 * ' 
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teacher, administrator, or school organization) are on trial or held ;'ac- ' 
countable." It is the hope of the Bureau of Special Education that feedback" 

• * 

will be'gotten from the data-banking activities for making decisions on 

w' o keep an on-going programming technique or to change to a different. 

one. Personnel - administrators ,. teachers , etc. -- should not feel threatened 

in the least. ' , 

* The basic philosophy of holding individual educational staff members 
accountable for their action or lack of .action does, of course, have merit. 
However, the truth must be faced that accountability machinery is just not 

yet that refined for making" such frrrety-honed-4«c4*lons, _ Nonetheless, 

many benefits for decision-making can be gotten from the existing potential 

- * 

in formal program evaluation. Such benefits are explained in the enclosed 
model in great detail. In brief, however, data-banking activities for ac- 
countability are me^rtNas an initial ^effort at the state level to provide, 
answers" to questions such as: (a) How far can children within given ranges 
of potential and with specified disabilities progress over a certain amount 
of time? (U) How much farther or less can such a child progress, under a 
different instructional approach? (c) What cost-effectiveness factors enter 
: the picture? The answer to (a), as simple a question as it is, Is unknown 
for any area of exceptional fty. Permanent records will be kept on the answers 
to these questions, as well as others. As the data accumulates to a greater 
extent, more complex questions can be answered. This is the type of monitoring 
job a state can be doing if it so desires. . 

In this somewhat lengthy introduction, I have endeavored to give the 
reader a flavor of the underlying philosophies that guided the development 
of the' enclosed formal program evaluation model . The impjementat ion of a 
decently functioning accountability system is a vast undertaking: With such, 
a huge job, the whole range of measurement criticisms" will be met. I just 



! 

xiit 



hope that special educators will nqt be distracted by potential crlt- 

i 

icisms to such an extent that they jfai 1 to see the forest for the tr*ees 

In terms of long-range benefits. 

* i ' 

4 * 



Bartoq*B. Proger, Ed.O. 
Director of Evaluation 
and Dissemination, PRISE 
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CHAPTER 7 . 5 

FORMAL PROGRAM EVALUATION (NORM- 
REFERENCED MEASUREMENT) : 
REVIEW OF LITERATURE" 

Introduction 

#* 

Ir* this first chapter, several types of evaluation procedures will 

be surveyed and put into perspective with- respect to one, another. There 

are at least five major types of evaluation activities thqt are said — 

j-j ght ly or not— ^to. fall within the province of formal program evalu- 

Ration: (a) formal program evaluation (norm-referenced measurement), (b) 
• <? » 

individual .achievement monitoring' (cri terion-referenced, measurement) . (c) 

J 1 "* . * . \ * 

accreditation- type on-site evaluation visits, ^d) descriptive systems- 
analyses evaluations,, and (c) demographic data record keeping* In turn, 
thfese five evaluation activities can be envisioned to occur at four levels; 
(a) national, (b)' state, (c)* regional , and (d) local. These preliminary 
relationships are indicated in^ the matrix in-Figure 1. 

The present chapter will focus on only the first type of evaluation 
system: formal program evaJuation (norm-referenced measurement)* Howeyer, 
passing' mention inthis chapter will be accorded to accreditation on-site 
visits, descriptive systems analyses, and demographic record keeping; these 
three types of evaluation systems do have some limited ^alue and in some 
situations may even be deemed necessary. Nonetheless, it is the opinion 
of the Author t^at only formal program evaluation and individual achieve- 
ment monitoring really deserve any sustained attention when- educational 
agencies are considering the implementation of sophisticated and worthwhile 
evaluation systems. Thus, a separate chapter wi, II be devoted later to ex- 
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plaining the meritjj of individual achievement monitoring. 

.This paper will consider the issue of program evaluation in the ai;ea of 
learning disabilities. Several topics will be covered: (a) definitions of 
program evaluation, (b) currently used models of program evaluation, (c) a 
critique of those models, (d) an example of both formal and Informal program 
evaluation in the field of learn ing di sab Mi t ies , (e) a suggested resolution 
of the program evaluation dilemma, and (f) some. hints of the future |n\learn* 
ing disabilities program evaluation.. ^ 
DEFltyrrtONS AND NATURE OF THE PROBLEM . 

In the field of learning disabilities, there is a great deal of confju- ~ m 
sion about what M eva1uat ion 11 means* Indeed, this confusion extends into all 
areas of exceptionality and even into regular education. Part of th'e cori- 
^fusion stems from the great deal of emphasis given to clinical evaluations or 



diagnoses of individual children. Too often special educators havV considered 



individual pup»i 1 evaluatfon to be synonymous with program evaluation. \ 

• ' .-' • x 

Before going any further a working definition of program evaluation \^ 



must be given. I consider program evaluation to""be the process o.f gathering^ 

evidence (test data, anecdcftal teacher records, clinician observations, and 

so on) on the effectiveness of the total learning disabilities program (wheather 

run by*an inclivjdual public school district, a private or parochial school, a 

county school system or an intermediate unit, or even a state hospital). , To 

guage the effectiveness of the program as a whole , program evaluation relies 
A. * ~ 

on the individual pupil diagnoses or- eval uat ions . This, information on indiv- 
idual pupils is combined or averaged in meaningful ways to gauge the progress 
of certain types of pupils within the learning disabled prQgram, Such pooled 
information will- yield results in a more manageable form than the separate 
pupil evaluation records so that the .program administrator, teacher, or other 

* . 9 

staCf member can make future programming decisions on a rational basis. Too 



often educators have been accused' of making major programming decisions 
on an fntuitive basis - "armchair philosophizing" (cf. Progeret.a1.., 1970). 
Program evaluation, if used intelligently, can help eliminate such criticism. 
i ^ summary, Dj^ram evaluat ion can be considered a step above pupil eval- 

uation in complexity. . 
: * f 

To clarify further the working def initbn of-program evaluation, one 

need only look at the federal l) f unded "prog rams under the Elementary and 
Secondary Education Act of 1965. Projects funded und^r Title I.N of that ' 
act are required to submit formal program evaluation data on the effective- 
ness of the program with children. Program evaluation has received, increas- 
ing attention recently* as federal and state officials become more and more 
aware of the low .quality - or even complete absent - of program evalu- 
ationjn federally supported projects (Smith and Brecknell, 1 969 ; Erickson/ 
1970). Thus, certain sources of federal and even state .funding have required 
learning disabiM ties,. educators to produce at least some semblance of program 
evaluation, no matter how poor^o? Impropriate. However, let us not delude 
ourselves. It is in the^rea of locally funded leVrnTnT Tis^i-Uiles programs 
that program evaluation is most crucially needed and, paradoxical ly, most^^ 
frequently absent! < s ■» 

Program Evaluation is also frequently confused with accreditation of 
schools (e,g., North Central Association of Colleges and Secondary School s , 
1969). While the opinions of visiting experts, teacRers, students, and 
parents are important, accreditation forms a distinct area of the broad field 
of evaluation that will not be considered in this paper.- Further, the dis- 7 " 
cussion of program evaluation here will be'confined to student achievement; 
feelings, and performance. Teacher competencies, financial resources, organ- 
izational st,uct.ure, etc. , are left to other types of evaluation experts.' I 
believe pupil functioning is theSffi^most important 1 aspect of any learning 

*0 . 



disabilities program to be examined, program administrators and teachers 
will be able to obtain a great deal more detailed information for decision 
making from direct data on the pupils as compared to a model weighted down 
with other variables such as money and staff competencies. The completely 
generalized, competently functioning program evaluat ion model is far in the 

future, to say the least! 

The reader should also be aware that several program evaluators (Scriven, 

* - 

1967; Stake, IS67, pp. 525-526; Atkinson, 1967, p.2) suggest that the' process 
of program evaluation should pot only describe the change that occurs in 
pupils over the course of time (such as gains in' test scores) but^should also 
judge whether those^changes are acceptable or. not. Some might question this 
viewpoint "in that it infringes upon the nonevaluator-program administrator's 
- role of decision making. Other definitions of program evaluat ion have been . 
posited "(Cronbach, 1963, P-672; Griessman , ^969, p. 17; Welch, J 969, p. 429). 
However, the stage is now set for examining some of the major program evalu- 
ation models. 

* h - 

CURRENTLY US EO PROGRAM EVALUATION MODELS « , 

" ; . • . 

How does one.go about designing an adequate program evaluation scheme? 
There are many models now- avaj 1 able that describe the major steps in program 
"evaluation design; as generalized guidelines, these descriptions deserve the 
name of "models." One should not expect to see a complicated mathematical 
model; such models have had success only in very specific contexts that do 
not possess the ma'ny complex it ies of an ongoing 'learning disabilities program 
N(cf., Welty,..1969; Brooks, 19*69; Alkin.'Gl inski; and Wininger, 1,969). - • 
' \^e7hap~s^h^ date has been 

the StufKebeam ,(1967) tIPP structure; Context Input ^.Process, and Product. * 
Rather, than Vexiew some of the characteristics of the original CIPT 3 model, 

let us look at onVof the many, modern adaptations of the model: the CDPP. 

O-f' ■* 



represents Context, Design, Process, and Product. Describing the major • 
steps in the CDPP evaluation process, Randall (1969, PP- 40-42) states: 
••Context evaluation consists.of planning decisions and context information 
that serves them ...Design evafuatiqn entails structuring decisions which ^ 
depend on design information ... the bbjectives need to be specified oper- 
ationally if possible, and activities or means of attaining them need to be 
specified ... After a design has been structured and is put on trial, often 
called the pilot test, restructuring decisions are faced. Restructuring 
decisions are based on process information ... Af ter,components of a design, 
have been tested, they can be put together in a prpgram product or 

field test. Since this is the first full^cycle test, the major decisions 
faced are whether 'to recycle through another full-scale field test. The in- 
formation needed, called product information, entails'not only evidence 
about effectiveness in attaining short -'ar>d long-range goals-, but also ef- 
fectiveness ... compared with that of another program or strategy.'" 

Because the CIPP-CDPP-type model is the basis for many other program 
evaluation schemes, it would be helpful to illustrate the tDPP variation 
briefly with a learning disabilities problem. Let us suppose a "perceptual 
motor training program has been deemed appropriate for use with a certain 
group of learning disabled children In a private .school . The Context, part " 
of the evaluation cycle would invol ve looking ,at the needs of the children* 
' in detaifand the associated problems, behind those need*. Preliminary studies 
. (such as individual pupil diagnoses) would be\appropr iate to help determine 

• needs of the pupils." On the'bases of the pupil needs and the problems under- 
lying them, broad program goaj* and specific behavioral objectives are de- 

• lermined.' Design evaluation (or in Stuff lebeam's terms , Input, evaluation) ^ 
can be thought of as having' a primary, purpose of arriving at a feasible train- 
ing program for the leading, disabled pupjfo • Desii^evaTuatio* entai U con- 



"sideration of the constraints of the private school: funds, staff skills, ; . 
facilities, scheduling, etc. The program administrators" and other staff 
members have the responsibility for examining all the specif ic -detai Is of 
the perceptual. training -program they originally thought appropriate; the 
literature - both research and philosophical opinion - would be searched 
" thoroughly to gain insight into the virtues and flaws W that particular 
training program. At the same time, however, alternative training programs., 
would be examined in t*e literature to see if an even more suitable* pro- 
gram might be found (see Proger/^ al_. , 1970). Once these preliminary steps 
. have suggested whteh perceptual' training 'program might be most suitable with 
the children'under the constraints of the private school , design Evaluation 
concludes its role by>specifying more fully what is to be done with the * 
Children. That is, design evaluation also"implies the specification of the . 
actual steps to be used in' the training program f ina^ly selected , and the ^ • 
specification of .a. desjjn.for gathering evidence of effectiveness., fj^cess 
•evaluation can refer, to an actual pilot test of the perceptual . motor training- 
program; evidence of effectiveness during the pilot test is used to restructure 
' the final program that is to be used later in the regular activities of the 
private school. However, more often than hqt , pilot testing wi 11 not be 
.possible and process evaluation .w? 11 refer to in-process qual ity control mon- 
itoring of the final program i tself -evidence will be gathered systematically 
' throughout the actual training programed will be used to restructure the 
program as it is running. Product evaluation is perhaps the most familiar step, 
since \± refers to gathering the evidence of effectiveness at the end of the 
perceptual motqr training program. Thus", when product data is compared to 
pretest data and to process data, analyses can be generated which yield in- < 
formation that the program administrator can use as a basis for making future 
-—decisions. * , * 



Thus' one sees that, the CDPP program evaluation model generally fits 
all aspects of" the evaluation process o.f any learning -disabi 1 ities program/ 
Thisls one of the primary strengths of the CDPP model. A more detailed- 
; discuss l.on" of context evaluation can be found In Freedman and Swanson (1969) 

and in Hammond (1969). Stufflebeam (1969). provides a fecent discussion- of 

. * * 

his-CI,PP 'model. * ' 
Another wel 1-known^rOgram evaluation model s that of EPIC (Evalu- 



ative 'Prog rams" for Innovative Curriculum*) . The EPIC -structure for eval- 
uation" (Hammond, n.d.; EPIC, 1968) consists.of a three-dimensional figure: 
'a cube. The" "Behavior" edge Is divided into three units: cognitive domain, , 
affective domain/and psychomotor domain. The "Instruction" edge has five 
units-, organization, content, method, "facilities, and dost. The third edge 
is labeled ."Institution" and is divided lnto\lx uni.ts: student, teacher, 
administrator, educational specialist, family, and community. The cube is 
embedded Into a five-category scheme df variables. The first 'category - 
- "Prediction Sources" - Implies 'that one examines the various types of in- ^ 
struction that might.be used in a given situation. The second category of 
••Descriptive Variables" suggests that the actual steps to.be used in the - 
" Instructional techniques-Wto be spec! If ied carefully, along with the- con-") 
straints that the institution places upon the teaching, "Objectives' 1 f^ms~- 
She third category of "variables. The fourth category consists.of the cube 
described above/ The actual design for collecting the effectiveness 'data is 
specified in this step. The .fifth category - "Criteria of Effectiveness" " 
implies the analysis of ail data obtained. One can see, the similarities be- 
. tween the CDPP model And the EP.IC scheme, - . *, 

The Veader is now aware of two main program evaluation models and the 
types : of 'guidelines Suggested by each.' Schematically, the EPIC design can be 
said Woe representative of the geometric model-bui Idihg'ef forts (in this \ 



c^se.Vcube), while the CiPP design characterizes the logical eggcrate 
pattern (the Jour main stages' are placed horizontally on top of a rectangle, 
and subdivisions are placed vertically down the left sjde of the -rectangle, - 
thus forming an eggcrate classification scheme). However, a .few other models 
might 'be mentioned here for further reference.. 

Scriven (1967) has produced in eng thy book chapter on what he envisions, 
as program evaluation. He tries to formalize in. much greater detail than ' " 
^tner writers' in the evaluation field what he considers to be the "methodology"- 
of evaluation. One gets Into statistical design discussions and other '.tech- 

ni.cal areas.. . • 

Stake (19671 builds a logical -eggcrate design. *His basic model consists 
of two major blocks, of information: a "Description Matrix" and a "Oddgment 
Matrix." Data can be subclassif ied in either matrix as "antecedents", "trans- 
actions"; or "outcomes." The descriptive matrix is further sybclassif ied into 
"intents" and "Observat ions",^ whi 1e the corresponding dimension in the judg- 

* 

ment matrix consists of "Standards" and "Judgments." . 

Atkinson (1967) divides his evaluation model into three domains accord- 
ing to the areas objectives can be "constructed: structare (school plant, ^ 
organization,, etc), process (instruction),, and product (student outft*»> 

% * • \ v 

behavior}* - w ^ " * \ \ i\ 

' Pohland (1970) describes a geometrical program evaluation mpdel devkl- ^ 
oped by Howard Russell and Louis Smith at tKe Central Midwestern Regional 
Educational Laboratory, Inc., St. Ann, Missouri. Like the; EPIC design, the 
CEMREL model is a three-dimensional cube. ■ Along one edge of the cube is the 
* "focus" of evaluation (student, mediator, or material). The Jecond edge 
♦ deals with the "role" of evaluation (formative and summati ve) . The final . 

* 

edge ponslsts of ^data" (scale measures, questionnaire responses, and par- 
ticipant observations) . ^ 



A discussion of basifc program evaluation models* v(ou4d 4ie^fncc*nptfete < 

without systems analysis, in 1968 a group commissioned by x the National t 

f Security Industrial Association studied, the application of systems' analysis 

' in defense-to the area of education. Carter, (1965, pp, 22-23) summarized 

\ ' - x 

eight steps of sys terns analysis t£at could be useful in education: M (a) • • 

State the real NEED you'are< trying to saUs£yr~it>) Define ;the educational >- % 

OBJECTIVES which will confrjbute ,to satisfying* the real need; -(c) Define # 

v those real^world limiting CONSTRAINTS which Sny proposed system must satisfy; 

, ,: " - ' / 

(d) Generate many different ALTERNATIVE systems; (e) SELECT the best alter- ■ 

S;- ' *, • . . •*■ * '7 
alysis; (f.) IMPLEMENT the selected al ternrative(s) fbr 

testing-" tg)" Perform a thorough EVALUATION' of thfe experimental <ystem|-(h) 

. Bised on experimental and real world Wm#£s, FEEDBACK, the required MtfDIFJ- 

CATIONS and continue this cycTe unftil the objectives have been Attained." - , < ; 

Robe rlson °( 1969^ pp.31) claims 11 .7. application of systems analysis tech-/. 

* ,; nique$ tc evaluation differs from .PERT (Program Evaluation Review Technique) . 

PERT focus.es on the steps, the time, and other expenditures in; the identified 

evaluation or research processes , while .;.*[systems analysis] shquH be - f 

thought of in terms* of the operating program^not- the evaluation process per 

se. M . Addi tional* thoughts on systems analysis models can be found i-n Dyer 

. v . * *' - . u * 

(1969) Aninentorp, Daley, and Evans (19^9); Ryan (J969); Wallace an$ Shavelspn 

(1970) . , , * , I * * ~ 0 f t • . 

" , • • , \ • 

CRITIQPE OF PROGRAM. EVALUATION MODELS • . # # . 

. a By now I hope to have conveyed ta the reader .tfieWend j n current ad- 
'•'vocational literature concerning the ^construct ion » of -general program evalu- 
ation^dets^ In recent years educators of alt types have been bombarded 
4 with an ever-increasing tide of such models (and I have sampled only a smalt 
portion in\the previous section!). It is time* to stand back and assess the 
• 'relevance Snd success of this bulld-a-modej. marathon. First, let us look at 



What some evaluatbrs have said about their colleagues 1 efforts,. 

Many professional evaluators are skeptical of this model-building 
trend. Early fn the game, Cronbach (1963, p. 672) stated: 11 ... I a,n be- 
coming convinced that some techniques and habits of thought of the ev^l- 
uatlon specialists are 111 suited to current curriculum studies. M Cronbach 
hit the evaluator himself, While Staj^e (1967, p., 524) -aimed his pen at the 
people who should be using evaluation special ists : f r, The Issue here is the 

potential contribution to education of formal evaluation, Today, educators 

• - - * # -** 

^fall to percerve what formal evaluation could do for them. They should be* 

tnplorjng measurement specialists to deVelop a methodology that -reflects 

the fullness, the complexity, and the importance of their programs. They fc 

are not.*' Other indications of the failure of program evaluation are given' 

fay Cuba (1969), Sorenson (1968, p. 4), and Scrivtsn (1967* V p,53). 

What are tfhe symptoms of this failure of program evaluation* in special 
* 

education.?- T.he answer is simple enough: virtually no implementation in any 

, area, of exceptional ity,/ Granted, the qua 1 ity level ofprogram' evaluation 

roi^ht have jrj sen slightly In federally funded programs because of the thrust 

*for "Increased accouhtabit ity/ 1 But the real problem resides in the locally' 

' funded^fograms^whe^re such excellent opportunities for, program evaluation 

.'fexUTv* Hex\> therfe is almost no program evaluation at«a1k How many learn- 

Ing disabilities programs today are real ly being examined in a formal sense 

using % systema£i^lly gathered data? Please note that I am not talking about 

program evaluation in ^terms^ of the usual indices of number of dollars spent, 

number of Qertified staff , /lumber of children served, etc. Rather, I am 

talking* aboutf formal;' statistical evidence of any gains made in the program as, 

/ • t / » - • * * 

a whole* derived of* course fxom the gain data on individual pupMs. I am 

talking a/so of comparative 'gain data # 6f one .learning disabilities program 

p|ttt#l ^galrrsf another program'preSuma&le aiming a*t the same goal but with 

• * ' - / * f " ' ' * 



♦ 

» 

different means. This type of data just is not being provided on a routine 
basis for decision-making i n locally funded f on-going programs. About ,t he 
,only evidence of formal program evaluation that is visible lies in isolated 
occurences of university research project evaluations in local schools (just 
examine any profess ional journal!), 

; So much for the obvious symptoms- of the failure of program evaluation. 

What are the underlying causes? I think one primary factpr has been the 

« 

emphasis on mass dissemination^-- in professional Journals, conventions, etc.~ 
of the general program evaluation models. A few years ago, during the truly 
primeval stage of development in program evaluation theory, there were vir 4 - 
tually no generalized guidelines to follow. Thus, initially) models such . 
as C1PP and EPIC performed an admirable service in causing awareness, "to some 
degree, in program administrators (but not in teachers!) of the need for 
program evaluation- and of what its Baste features are. However, with the 
flood of literature in this field (books, monographs , art icles , speeches), I 
think the administrators and teachers in the field becamedisil lusioned. After 
the initial dissemination of the models / nothing new was bein§ said. Ther4 is 
an even more basic flow in the massive, never-ending, model-building epidemic: 
lack of specific advice within the models for actual implementation of the 
evaluation. If any learning disabilities educator examine* the several 
program evaluation models presented earlier, he will probably remark: M l un- 
, derstand what you are saying, and it all seems very logical. But , ( know 
I myself will never be able to use the model in my particular situation be- 
cause no really specific guidelines are. given. 1 would need expert evaluation 
h6)p to use the model but do not know where' to get such help. So I wH for- 
get the whole thing! 11 And ther.e is the crux of the matter as I see it. The 
models have reached their level of functional incompetence with respeet to the 
real wor-ld. , 1 * 

.- • ■ IH ' ■ 
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This problem of functional incompetence of mddels^s bas ical l ( y one 
of analytical overkill. In the past few years, the prograrTe^a-Uiation models 
have been refined, re- refined, ad nauseam. Indeed, some eyaluators have 
even turned t eir by now finely honed analytical skills to a higher level of 
synthesizing: "meta-evalu'ation" 'and "taxonomies" of evaluation designs (cf. 
Striven, 1969, and Worthen, 1968). Whether these super-analytical efforts be 
worthwhile oY not, we had better slow down and re-examine our position, eval- 
, uators! The educators in the field have been left behind! Clearly, model 
building is" not having a very salutary effect' on education. In regard to 
the model-builders, Finn (1969, P-18) asked: " ... is it possible that they 

have, in fact, over-analyzed the process Have, In fact, these. 

1 -analyses departed from operational reality, at least in the sense that the 

practitioner would not know what to do with them?" Thus, Finn suggests (p. 19) 
that perhaps program evaluation has acquired that dreaded affliction known as 
■ s "hardening of the categories!" , 

Let us a'. so ask at this point in the model -bu i Id i ng game just for whom 
' * the recently developed models are intended. We know they are not meant for 
the program administrator or teacher; they cannot handle the model on^their 
own. What about professional evaluators? Could the models be aimed at tn- 
j ' creasing their competence? I think not. After reading the initial CIPP and 

EPIC models years ago, I as an evaluator have not received jny new insights 
4 ' from the spate of publications issued since that time. The models appear to 

j be stimulating thought in no one. They are highly repetitious and are prob- 

ably doing more harm than good at this point, 
J In other words, the dissemination function of these models has outlived 

its usefulness. It is time to put the models on the historical section of 
the educational bookshelf. In my opinion, a model is supposed to lend a 
unique perspective not ord-narily realized by the majority of practitioners 

£9 
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In the field for which the model is 'meant. The models did this years ago, 

but no longer do. , , 

A second major cause for lack of widespread implementation of program 
evaluation hasybeen the absence of an aggressive "sales campaign" by exist- 
ing agencies: county offices or -intermediate units, regional materials and/or 
^esoirrce centers, universities, and state departments. Educators are tra- 
\ ditionallvTslow to^dopt innovations. Thus, some vigorous prodding is needed. 
\ Existing agencies that hTvVprogramevaluation consultation capabilities must 
Lsume the responsibility. i0 -keep knocking on the doors of potential clients. 
Simple advertising of the availability of *uch services is not enough. 
I as a solution for those in need of professional evaluation assistance, 
- o^e might as this point, suggest that the answer is simple: go to the local 
.university evaluation service bureau. However, how many educators in the 
field really would feel free to' call on consul tan ts at universities? Not 
ve\y many, I aVafraid. There is an inherent distrust of -universi ties in many 
educators. Some'might even say: "The only consultation we ever received was a 
rlquUt to do research in our school ; we never got any practical benef its, 
' fL\it. .Any program evaluation consultation we get will probably be equally 
impractical! So why bother?" Perhaps this attitude* is unjustified on the 
part if educators with respect to some of the more servixe-or iented univer- 
sities' However, the attitude does exist, and it must be coped with. 

One might also suggest that the program administrator obtain consultation 
frpm anWncy like EPIC. * This is fine for those who live in the vicinity « 
oAucioJL Arizona', or near a handful of similar agencies. However, the vast 
majorityiof learning disabilities educators must do without such consultation, 
and thus 'without program evaluation itself. 

I A Jch more powerful solution is needed. Before suggesting a possible 
relolutiol of this sorry state of the art, let -us examine briefly a typical i 
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example of learning disabilities program evaluation, .Perhaps too much has 
been expected of 1 formal program evaluation. Let us see just what an evalu- 
ator might be able to deliver. 
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** f. A brief example of some general aspects of program Evaluation in a 


- 


1 ft 


learning disabilities program has been given earlier in connection with 




t 


the CDPP-CIPP model. To make tnat example more specific, let us assume that 






, a group of thirty dyslexic children have been diagnosed as having compar- 






able* etiologies, that they lie within a relatively parrow a§e» span* and 






that othet? pertinent factors are comp^rablfe among flthe children 4 . ..In Other * 
wqrds, meaningful comparisons can be*made amdng various subgroups of the 




0 

if 


children. We will also assume that concrete action has been 'taken to cajrry * 
out the 'prel iminary phases of the CDPP-CIPP model. For example, during the 


• 


II 


context evaluation phase, a diagnostic pretest of reading deficit has been 




a 


given to all children. Using this information and other data from each, 
child's records, needs have been determined. Since some kind of perceptual 




• 


motop training program was considered appropriate, specific measurable ob- 






jectives were specified in both the perceptual motor and reading achievement 






domains of behavior; each objective was to be measured by a corresponding 






standardised (or, if more appropriate, locally devised) test. Before w& 

* * 


• 


J 


enter the scene, let u£ also assume that the design or input evaluation, phases . 




• 


have been partially accomplished in that alternative^ training programs have 


• 


i 

t 


been examined,- all within the light of the constraints cf the school. Thi$ * 






sets the scene for the example to be discussed below. All of the above steps 




i 

4 


have been accomplished by an evaluation consultant working with the program 




o 
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administrator and staff. £J^J % " 1 
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As we enter the scene, the program evaluation Is ready to conclude* its 
design or /Input evaluation phase. The evaluator must now decide what type 
of data gathering scheme would be appropriate. It has been agreed among all 

=4w^ved In this planning that the pretest of reading deficit can be used -to 

* *•> • 

divide the pupils into three groups of ten each; minimal deficit, moderate 

^ \ * 
deficit, and severe deficit. All agree that specif ic* information on the wgys 

tn which these three. broad classifications of dyslexic children progress „ 



throughout the perceptual -motor training program would be valuable/for de- 
. cjsion-maklng* on a short or long-term basis. Since the program •will be run 
during the full academic year, it must be decided how many tests to give 
during the year. 9 - For purposes of in-process quality control, it was decided 
m to give three middle-of-the-year tests as well as pre- and post-tests (all a 
testing occasions use^ the same tests, or better yet, parallel forms of the 
same testis The resultant data collection scheme is given in Figure 2. The 

■ 'i .'! 

evaluation consul tan t^tconcludes the des i_gn or input phase by specifying the 



type of* statistical*" analysis to be used on the data: in .this case, perhaps 

\ v ; . . 

a "repeated -measures ^natysis of K variance. <l ' It ^hould be noted here that 
compl icatecJ statistical methods should 'nevfer frighten program administrators 
or teachers* avpy; the evaluation consul t.ant- has primary. responsibM i ty for 
selecting, performing, and interpreting the analysis. 

At this point, one might ask how such a formal evaluation can aid both 
the program administrator and^teacher in reaching rational decisions. In* 
Figure 2 r I have sketched in average learning curve profiles for each of the 
three diagnostic categories: minimal, moderate, and severe. Before entering 
into any detailed d i scuss ion , ,the reader should note that the collection of 
test data during the three in-process testing occasions (1/4, 1/2, 3/M 
0 constitute the process evaluation phase of the CDPP-CIPP model, while the 
post-test comprises the product evaluation phase. 

33 
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FIGURE 2, \ "° 
SIMPLIFIED EXAMPLE OF PROGRAM, EVALUATION DESIGN 

Times of Testing 



Diagnostic . 
Groups 



Minimal 
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Moderate 
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Severe 



Pre 
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f , Let us first consider the benefits to be gleaned for the program ad- 

rojnist rator f rom this format program evaluation. Ke jnay note that the 

"J, minimal reading deficit group gain nicely throughout the four quartersof 

the year until the final testing, then drops off. If the administrator ex- 
amines the programming approach used with the ••minimal 11 group, he might 
be able to isojate some of the probably causes of this change for theworst. - 
Similarly, if the administrator considers the profile gain curve shown ih 
\ the bottom row for severe^def icit children, a large drop-off in remedlatToo 

occurs after the second quarter of the'year;. this in-process measure would 
1 / tell the administrator to make some on-going change^ before the end of the 

year- Granted, group profiles, averages, and so.on, which are the working 
tools of formal program evaluation, have deficiencies (e.g., covering up 

ft j 

finer differences among pupils). However* for situations in which more or 
-i, less common features of .remediation (even if individual ly administered) are 

» epplied to certain types .of children under the broad heading of dyslexic, 

formal program evaluation can yield valuable benefits for the administrator. - 

* * 

What about.the teacher? Even in such a necessarily oversimplified 
j example, informational benefits aimed toward remediation should be evident. ~ 

^ The teacher will play the major role in the data gathering process and will be 

■ < njaking immediate, day-to-day programing changes (process evaluation). /He 

"j will maintain a score \sheet like the one in "Figure 2 but subdivided into ad- 

ditional horizontal rows within each ^.of the three diagnostic categdries 
j already shown. Each child's name will be appropriately placed along the left 

_^ of the data maxtrix in the corr^cft^ diagnost ic category. The scores of each 

* child would be placed on his smal 1 ,\hor izontal slice of the matrix. The 

\ 

1 teacher might want to keep individual g^in profile curves on each child while 

- . \ 

gathering the data. for the program administrator and evaluator. In this way., 

1 the teacher would be able to see at a glance\hether or not an individual child 
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remediation was having a beneficial effect, arid to act *accord?ng1y whi ie 

s 

there is still time to make ^ meaningful change in the programming for 
the child. 

- 1 

V 

At e'ach major point in the data collection process, the datasheet is 

| / 

copied from the teacher and fed to the program evaluator to suggest and 
arrange for appropriate analyses. These global program evaluation findings 
(that is, averaging and pooling to gauge the' progress of the general types of 
children as a whole) are given to the program administrator. Thus, Ideally, 
bath the teacher and administrator would. obtain appropriate feedback for 
decisiqn making immediately. Decisions can be made during the. program's 
operation and at its end^ (product- eya|uat ion) . - 

It must be remembered that this iTTult^e^sImp 1 e example of a prdgram 
evaluation design. The sophistication of the forma l"ana%s|s^and ..feedback -» . 
increase according to the desires of the administrator and the flexibl 1 ity 
of the program itself. Of course, the ultimate success in terms of utility 
of any formal program" evaluation depend on. the willingness of the administrators 
and staff to use the findings in an intel 1 igent way . Formal program evalu- 
ation does have limits (Stake, -1969; Wardrop, 1969). A great deal of debate 
has dlso centered around the differences between program evaluation and 
tightly controlled research (Schalock, 1970)." No one win deny that evalu- 
ation studies lack a great deal of *experi mental control in the purist's ^ 
sense of the word. However, if formal program ev^lritlon is coupled with 
informal program evaluation, an intelligent basis for making decisions arises. 

How do informal program evaluation methods enter the picture!' Teachers 
and other staff members are continuously making use of these techniques when 
they administer their "homemade" tests, construct anecdotal records on in- 
dividual children, on~the-<vot observations of emotional difficulties, 
e-stimates of ability to interact with Classmates # and so on. Too often these 
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types of data are shunted aside as Being "nonstandardized", "subjective", 
etc. Iflty Is It that teachers " using their Informal, subjective assessment 
techniques often come up wjth a much more effective remedial prescription 
for children than do objective, outside J'pxperts" with all their standardized 
testing instruments? Also, how* many federally, funded programs for thd dis-* 

advantaged or othdr exceptional populations have been judged dismal failures 

I 

tn terms* of standardized test dafa alone? - Formal testing" evaluation is quite 

* * * 

limited at'times, and for this reason alone informal data gathering pro'E^dures 
should be used wherever appropriate %o obtain a more complete picture of^what 
Is occurring in a program. » Karl (1970}; Reynolds (1967Vf* a nd Kunzelman (1969) 
have stressed the great potential of informal teacher evaluation. Clearly/ 
both- formal and informal evaluation procedures are needed ^in any serious ^ - 
assessment effort. * * ' 
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CHAPTER MJ_ • ' " 

GENERAL IMPLEMENTATION STRATEGIES 
OF FORMAL PROGRAM EVALUATION 
(NORM-REFERENCED MEASUREMENT)' % 

s 

Thus far, many grandiose, schemes have been advanced for carrying out 

■* 

program evaluation concepts in a learning disabilities context. But who 
wi.ll be available to the program administrators and teachers for providing ♦ 
custom-made evaluation consultation? Most existing general service agencies ; 
at .the district, county, regional, and state levels do not have full-time"; 
program evaluation consultants. And*l hope 4 I have demonstrated that the 
evaluation models- gre fai/too general to be of any real help for spec i fete • 

program evaluation problems. We have also discussed why universitves 

/ * - ' - 

probably will not.be asked to provrd£ consuftatioR in .this .area, I would 

like to propose! a new typ£ of general service agency that might form part ( 
of the answer. U is time to stop, building model's and start building consults 
at ion agencies.V . - 

Th£ major thrust in any attempted resolution of the poor quality, of ex- 

istVng program evaluation in laming disabil iti'es, in my opftiiofl, must l is 
* • ' * •* " * • 

(n providiag cusfom-mSde evaluation consul tat ion to any qualified professional 

in need ofYt. Ideally, I wbul<J suggest that program evafcu'at ion centers be 

set up in strategic locations across each statfe. However, I realize such^ „ 

^ . K v " ' 

schemes ,afe not always practical , and some compromise must to* found. 

'There are two main avenues that appear -fieas ibW First, county -of f ices 

could -hi rje- one Qr two program evaluation special ists. ^uch people would have 

training at least at the Master's degr6 4 e level J n* educational research and 
\ — J — — * v * 

measurement. Thus, the county ofho£ could prov idfe" custom-made consul tat ion f 

* ■ * 

not only to the various exceptionality programs run by. the, count # y but ateo to 
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those special education programs run on an individual school district basis* * 
In fact, the evaluation specialists could prbbably also handle program eval- f* 
uatior^ consultation requests from regylar educators in eac^ of the individual 
school districts in the county. The entire educational community stands to 
behef it. 

If for' political, reasons cxr otherwise, (t does not seem likely that a " 

* * • « • 

county service unit will become program-evalciatfon-minded, then regional 

general service agencies would *have to be staffed wijh program evaluation 

specialists. For example, a laf*ge number of .federal ly-funded Instructional 

^aterials/mediaAespurce tenters have sprung "iijto operation dkir+wg the last 

few years. These centers usuaRy serve large but still realistically sized* ^ - 

regioas. The provision of individualized program evaluation consultation 

services would be easy fo append to the existing operations. Hopefully, both 

the 'county evaluation units and the medi a/mater lals/rescfirce center evaluation 

units would have state sanction, encouragement, and^even funding* The ex- 

pentflfffre for salaries and operating costs of the two or more evaluation 

specialises in either type of agency would be negligible compared to* the benefits 

which could be Reaped in. program improvement. ^ % 

One cautjonary statement of policy must be advanced., however, from past 

experience ifi such ventures. It is quite blear that many ongqj*g programsV— 

perhaps even the majority — will not make use of a service even though it, is 

' ■ * . / 
announced as^being available, free, and sophisticated. Program administrators » 



4 and teachers have seen too many gjmmicks and Revolutionary Ideas 11 come down 

the*road in recent years. Thus, program evaluation consultation services 

must 'be solcj. It is the responsibility of the evaluation specialists to un- 

dertake a vigorous advertis ing campaign (brochures, monographs, on~si*te visits,' 

* . 
> telephone* cal Is, personal letters, etc.) to stjr up Individual consultation 

'requests.* It is understandable "for a 'program not to wantto invplve itself 



in more "paper-push iq^' than at present If possible; t;o'many educators , the 
regional or county evaluation unit-appears to b*.just one mp re example of 
bureaucratic entanglements.* Such negative' Images must be offset through" 

proven performance. ~ " * ^ * ». * ' ~* 

Besides* actively 'solicit ing evaluation consultation business from the> 

urograms in its service region, the evaluation unit should etfso edrtduct pr<P» , 

.,*>' - ' 

gram evaluation workshops that. serve as a dissemination function of the agency. 

Here is one* rare instance where the general program evaluation models oan. • 

still be of some value' to the uninitiated. A small number of clients would' 

; > 1 * * *• 

participate In theV Workshop. The subject matter, ml gh^; cons I'st of simulated* 

evaluation excises in learning disabilities orj of back-and-forth discussion * '. 

of actual /cl i en ts problems. , < • , 

The nkin/service of the evaluation- units would 'be tq of f er individualized, 
I ■' ■ " J t 

custom-made pVogram #val uat'ion consultation on demandby.any client. However,, 

the agency >ould be remiss if 'it did not engage in information retrieval and * 
dissemination in program evaluation! For' example J in the "design" or "inp'ut" 
phase of-'fheie^P-CIPP modfel , the final program of remedial".™ must J>e tie- 
elded' upon in the light o.f competing approaches. How .does one obtain infor- - ^ 
"mat ion' on all these competing brand's of treatment? ' The .county or regional \ • _ 
•service unit could house an .information colVection of research journals, ERIC, 
government publications, professional book^ , 'curriculum guides , .technical. - 
reports, etc.., Arty client in'the service /region 'would be allowed to phone or 
^wrlte in a request to' the center for a sophisticated literature search of all 
' rel. ant findings in .the area -of concern. Also, ir% the selection of .appropriate 

testing instruments', comparative information on prices , -technical qualities , etc., 
■ could.be provided. The evaluation /agenby would aUo be responsible far dissem- 
inating information on existi-ng guides to program evaluation {e.g., Annas and 
Dowd, f 966;. Grobman,, 1968; Center^' for Instructional! Research ahd Curriculum 

h '* I 

• \ 41 • ' * 
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Evaluation, and Cooperative Educational Research Laboratory, Inc., 1969; 
Melerhenry, 1969; Ahr and Sims, 1970;. Mosher , 1968). 

Let me conclude my "grand scheme" by .throwing out a few words to those ^ 

f 

who may not agree with these ideas. A lot of potentially valuable schemes in ^ 

learning disabilities die shortly afterbirth because of too much talk and too . 

little action (indeed, an analogy may be made with the case of program evalur 

ation models in all areas of education). The above "solut-fons" to the program 

evaluation dilemma in learning disabilities are, to me, rather obvious. We 

... • (■ 

do not need a lot of local and state committees to conduct "studies" N of the 

problem. All one needs" is a few kev people who can ;get things mqving and keep- 
them moving. The above ideas - al. A them - have already proved effective 
in realistic, ongoing practice. There is simply no longer any excuse for the 
sad state of program evaluation in the field of learning disabilities! 
-. ' Before leaving the realm of personalized program evaluation consultation 
services, a few words about the role of the evaluation specialist would be ap- 
propriate.. I feel that, wi th becas ional exceptions, fairly sophisticated 
statistical-inferential evaluation schemes can be applied to most learning 
disabilities programs 'in operation. Each program evaluation scheme is highly 
'unique and usually applicable only in a narrow range of situations, before 
the evaluator has to shift gears entirely and devise a different design. I 
also want to dispel the myth that the program evaluator is, or should be, a 
'y.<*n-of-all -seasons" with respect to the whole range of educational technology 
Most of the recent breed of evaluation specialists are usually competent only 
in the fields of statistical analysis, design methodology, and test construct* 
and use. These specialists are not experts in curricular philosophy and thus 
cannot and should not make value judgments about remediation planning. If 
program cvaluators are being honest wi ththemsel ves , I seriously doubt whether 
they can pronounce judgment on a program, other than to yield some inferential 
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data on the quality of the intermediate and final products of the remediation 
and to suggest possible interpretations to the program personnel. Only if 
one can find an expert curriculum special ist* "ret^eaded" itfto an< eval uat ion 
expert (and I mean ful 1y retreaded!) can the program personnel expect to have 
ultimate value judgments about their program made for them by the eyaluator. 
Almost without exception, the program administrator and his staff members must 
make the final value judgments about the program. I also want 'to, make clear 
that I am not asking for programs to be unreal ist ical ly twisted into highly 
sophist icated research projects. This would be the usual criticism against 
one who emphasize* as much formal design methodology as possible in a given 
situation. AlT that I am advocating is that the field practitioner and program 
evaluator join heads in coming up with the most sophisticated evaluation design 
possible for the particular project in question without project distortion. 
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CHAPTER TV 

RELATIONSHIPS OF FORMAL PROGRAM 
EVALUATION TO EXISTING STATEWIDE 
', AND NATIONAL ASSESSMENT SCHEMES 

AH of, the discussion thus far has emphasized custom-tailored program 
evaluation schemes. In one case, for example, a perceptual -motor training 
program might employ the Sout, srn California Perceptual Motor Tests, while 
a similar program in a different regio/i might administer the Frost ig Develop- 
mental Test of Visual Perception. It Is difficult to compare the results of . 
one program evaluation with those of another, if not impossible. It has ( 
been tacitly assumed that the result*- of »"y d'ven program evaluation scheme 
are useful only to that specific progrtw. Could a more general izable program 
evaluation scheme be achieved for almost aH programs in aJX areas of ex- 
ceptionality? In other words, could comparable program evaluation schemes 
be devised? Current activities in regular education indicate the answer is t 
••yes." There are two main facets to this issue: (a) statewide assessment, and 
(b) national assessment. \ 

"^Several states have initiated statewide assessment or evaluation schemes. 
In general, a group of subject matter experts and others has. agreed upon a 
series of measurable objectives in the various domains of student behavior that 
any regular educational program would hope to achieve. A series of tests is 
found or devised for each major objective. Schools of various types of specified 
characteristics (s-ich as pupil population s ize, communi ty size, geographic 
location, etc.) are sampled randomly. The same battery of tests is adminis- 
tered by /local personnel in the selected schools.. From such test data, score r 
distributions and norms are derived. Finally, individual schools get feedback 
on how their students compared with sipiilar (a^d iss imi lar) students across 
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the state; mahipulatable characteristics of the schools that appear to be 
highly related to ongoing pupil behavior dre also identified (such as 
academip preparation of teachers, per capita expenditure, etc.). In most 
statewide assessment efforts, the battery of tests is administered only once 
a year in only a few grades; no gain data is gathered. 

Dyer and Solomon (1970, p. A) have stated: "Ultimately, we need to be able 
to answer the question: What educational processes work in what kinds of schools 
for wtiat kinds of kids? 11 One must remember, however, that these pilot efforts 
have been initiated only in the realm of regular education; special education, 
•in most cases, has not even been touched. O'Reilly (1970, pp.3-4) describes 
New York's statewide assessment program: , M Each fall, all public and nonpublic 
school pupils in grades 1,3,6, and 9 have received certain standardized tests: 
a readiness test for grade 1 and tests in reading and arithmetic for grades 3» 
6 and 9 ••* M However, O'Reilly does not feel a once-a-year data collection 
is adequate for program decision making at the state level; he suggests that 
more data collection points be inserted into the course of a year. Among 
other things, meaningful gain data can thus be generated. One can see the 
analogy with the custom-tailored program evaluation example mentioned earlier 
with respect to gain analyses. 

Loadman and Major ( 1 970) have described Michigan's statewide assessment 
efforts. Educational Testing Service (ETS) of Princeton, New Jersey, helped 
construct tests to measure program objectives considered suitable to the two 
grades selected for assessment: k and 7« Besides providing each school 
building within a single school district with results, more general results 
will be given by a two-way classification of community type (5 types) by region 
(4. regions). Othter analyses will also be performed. 

The Bureau c of Educational Research of the University of Virginia has been 
working with the Virginia State Department of Education since August, 1969), in 

45 
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one of the newer statewide assessment efforts./ Woodbury e£ ak (1970, p. 7) 
says: "Specific behavioral objectives ... include English (literature, 
language, composition) , Mathematics, Reading, Science, Social Studies as well 
as personal and social categories of affective behavior. More general- be- 
havioral objectives were developed for Foreign Language, Health and Physical 
Education including psycho-motor skills, Vocational Education,, Early Child- 
hood Education, Work Study and Library Skills, Specfal Education, Art and 
Music." 

Other aspects of statewide assessment have been described by Kearney 
(1970), Michigan Department of Education (1969), and the Pennsylvania Depart- 
ment of Education (1968). 

| have mentioned such statewide program assessment or evaluation efforts 

* 

in the hope of stimulating learning disabilities educators and other special 
educators into thought about devising a similar model in their respective 
domains. The possibilities are exciting or frustrating, depending upon one's 
view of statistics and testing. Will there be problems of major proportions 
in adapting such schemes to special education? Most certainly! For example, 
each area of exceptionality will probably have' to be treated separately. , The 
physically handicapped cannot be expected to» take some physical performance, 
tests, while the severely retarded will hot be able to wade through all but 
the simplest conceptual achievement tests. It is my hope that learning dis- 
abilities educators will at least try to-adapt some of the ideas of statewide 
program assessment for their own area. \ 

However, one need not stop at the state level in the attempt to devise a 
"standardized" program evaluation system. ThejNational Assessment of Educational 

Progress, (NAEP) has been underway for a few ye^rs in regular education. This 

i 

effort began in 1964. Since the ideas are basically the same as in some of 
the statewide assessment schemes, the reader Jan refer io the large body of 
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literature on the subject (Saylor, 1970; Katzman and Rosen, 1970; Groff, 
1970; Womer, 1970; Findley, 1970; Katzman, 1970; Caps, 1970; Ebel , 1970) 
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CHAPTER V 



USE OF CRITERION-REFERENCED 



MEASUREMENT IN FORMAL 



. PROGRAM EVALUATION, IN 
DISTINCTION TO NORM- 



REFERENCED MEASUREMENT: 



REVIEW OF LITERATURE 



Introduction 



The first four chapters of thi,s monograph have considered the use of 
norm-referenced measurement in conducting formal program evaluation. In 



theory. Such an evaluation strategy is quite appropriate when only a global 

overview of an on-going program Is desired, A classical research strategy 

♦ 

is used which would have at least a pretest and a posttest, and preferably 
one or more equispaced measures during the in-process part of the program. 
However, there will no doubt be -special projects with which the Commonwealth's 
Bureau of Special Education will be connected and for which the usual class- 
ical research evaluation design will not be adequate. Such situations lead 
one to a much more intensive type of formal program evaluation known as 
criterion-referenced measurement. A case in point with which most members 1 
of the special education staff throughout the Commonwealth will be able to 
identify is the National Regional Resources Center of Pennsylvania (NRRC/P). 
Here is a major project that is linked directly to the state Bureau of Special 
Education, as well as to regional special education agencies in the central 
and eastern parts of the state. Another aspect to criterion-referenced 
measurement that will be discussed in another chapter is data-banking activ- 
ities* For ease in discussion, the following abbreviations will be used: 



other words, standardized tests are used in accord with accepted research 
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norm-referenced measurement (NORM), criterion-referenced measurement (CRM), 
and data banking (OABA) . «. . . 

All material in these next few chapters that pertain to NRRC/P, CRM, 
and DABA was produced in connection with articles Lester Mann and Bart Proger 
are writing for project dissemination purposes with NRRC/P. The materials 
contained herein have been modified so as to tie in directly with the many 
aspects of the total, formal program evaluation model presented in this monograph. 

Consider a child who has been referred to NRRC/P as having specific; read-* 
ing disability "fa* the extent that he cannot function at even a first grade 
independent reading level . Suppose further that as part of the psychoeduca- 
tional programming for this child that one specific objective in picking up the 
child at his current level of functioning and carrying him forward, is to have 
him recognize letter differences among vowels embedded C-V-C trigrams.. Pre- 
sumably, dyring,the initial referral process, this child has already been 
diagnosed as having a deficiency in this particular reading skill area/ Further, 
. other components of the reading process will have been similarly diagnosed to 
provide some rough basal guidelines of where the child presently stands. How- 
ever, it must be emphasized that no undue weight will be given to basal func- 
tioning levels. Rather, the emphasis will be on what final levels of functioning 
the child achieves. This measure is what really constitutes the pay-off 

evaluation of success. , 
True, in a tightly controlled experiment one is interested in pre-post 
differences within and among treatments - the statistical significance 
phenomenon. With. respect to the real world, however, many researchers have 
been questioning the legendary thrust toward significance. We need a mode; 
different from the usual experimental one to answer the types of practical ^ 
questions that NRRC/P is asking. As mentioned previously, the project wants 
to answer the frequently asked but "as yet unresolved questions of: (a) how much 
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success can be expected in certain specific skills associated with seUct^d 
subject content areas as taught by a specific approach "A"; (b) how ionVit 
took a certain approach "A" for teaching that ski 1 1 to reach the observed\evel 
of success in (a); and (c) how the answers for questions (a) and (b) for \ 
aoproach "A" compare to competing approaches "B", "C", etc. Because the 
psychceducational programming thrust of some components of NRRCVP. demand- fhaT 
proaramming. recommendations be made in terms of a highly specific subject con- 
tent analytical breakdown of. the total task into its subskills, the usual 
standardized test, classical evaluation design is not appropriate. 

' Thus, NRRC/P has decided upon the use of criterion-referenced measure- 
ment, with overtones of achievement moni for ing.. and data bank activities. 
Popham and Husk (1969, p. 2) have given one 'interpretation of criterion- 
referenced measurement (CRM) : "It is not possibleto tell a norm- referenced 
test from a criterion-referenced test by looking at it. In fact a criterion- 
referenced test could also be used as a norm : ref erenced test - although 
the reverse is not so-easy to imagine. ... At the most elementary level, 
norm-referenced measures are those which are used to ascertain an individual's 
performance in relationship to the performance of other individuals on the 

/ 

same measuring device./... Criterion-referenced measures are those which are 
used to ascertain^n individual's status with respect to some criterion, i.e., 
performance standard. It is because thg individual is compared with some es- 
tablished criterion, rather than other-individuals, that these measures are. 
, describet'as criterion-referenced. ... We want to know what the individual 
can do, not how he stands in comparison to others. " 

Nonetheless, Simon (1969, p. 259) cautions CRM advocates not to get 
* carried away in the wash of jargonese: "... strictly speaking the distinction 
. between criterion-reference and norm-reference applies not to the test but to 
the test scores. In other words,, the distinction does not relate to the nature 



<jf the test or to the content or form of £he Items, but concerns primari ly 
the interpretation and use of the scores from* the test. It Is perfectly ap- 
proprfate for a single test to report both absolute performance (criterion- \ 
referenced) scores and .relative-performance (norm-referenced) scores. 11 

•* While Simon is technically correct, nonetheless, NRRC/P will be forced - , 
by the very nature of its objectives to make a working 3 distinction between. 
NRM tests (standardized ones, i.e., those with norms-) and CRM tests (custom, , 
project constructed tests). Getting back to .the example at hand of - the child 
getting training in recognizing vowel differences embedded in C-V-C trigrams, 
a CRM. test would be constructed for the measura-nent of degree of success at 

the end of the week-and-a-half (or whatever) unit of instruction. The mqasufe- 
ment experts on the NRRC/P staff would construct the CRM instrumenjr^beecmir^ 
a part of the achievement monitoring system (AMS) for this child. - 

The advantage of CRM testing is that the project personnel decide what the 
\ criterion of degree of success should be for a child with disabilities such as 
\the present subject exhibits. Perhaps for this particular CRM test of various 

types of C-V-C trigrams, the NRRC/P staff wi If decide that 65* competency fs 

neected before the child fs allowed to- move on to 4 the next sequential ar*a of' . 

subject matter. For a more crucial subskilh area, perhaps 85% competency pm 

\ i ' % 

the CRM test will be demanded. Flexibility, realism, and pract ical i ty are 

primary attributes of the CRM system. For the better part of this century, 
special educators haye been guessing at the answers to questions such as (a), 
<b), or (c). Other than a few isolated experiments in often contrived en- 
vironments or rather loosely conducted demonstration projects, the answers to 
such questions have gone -begging. Hopefully, the NRRC/P, through its CfiM-AMS, 
wilT begin to build a data bank (DABA) from which future educational researchers 
and practitioners can draw. 

It should be noted also that the results of the last few years of federally 
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funded projects will be utilized to their maximum potential in establishing 
and operating the CRM-AMS system. For example t for the purposes* of breaking' 
the sequential arithmetic curriculum Into its components and for gaining pro- 

gramming ideas, project PRIMES will be utilized. Further, projects that are 

* \ 
generating program materials along lines of a sequential task analysis/be- 

n 

havioral object Tves basis will be contacted as sources of materials* In terms 

* 

of specifying behavioral objectives and developing CftM test r instruments, the . 
Instruct tonal Objectives Exchange • ( I OX ) housed with the Center for the Study 
of Evaluation at UCLA will be tapped wherever appropriate, (see Skager, 1970). 

For years the main thrust ift educational measurement was away from teachet"- 
made tests and towards -s^ndardi zed instrumentation. No doubt a large causative 1 
agent in this trend was the great volume of ever-increa£ingly sophisticated 
educational t research studies, which usually emphasized standardized tests and 
rating scales. The "home-made 11 or locally produced variety of test wa^ somehow 
frowned upon and judged useful only in granting report card grades' but never. , 
for whole-year or global -prog ram evaluations* .Further, if CRM tests are to 
be used quite frequently as an in-process type of quality control at the ends * 
of major units or blocks of instruction in the subject matter sequence, then 
by the^ very nature of this frequently occurring measurement -task x £tity.,*te$ts 
must be custom-built to the users requirements as the measurement-'heeds arise. % - 

■ i * 

In other words, what we are saying, curiousjy. enough, is that "home-made 11 
test ing Wnst rumen tStta re back in vogue but — more importantly - c are also back 

in respect, when used intelligently and legitimately* This almost circular 

«• • # 

Iristorical trend in measurement methodology is indeed strange but -r in 
.education — not surprising. . Because CRM is rather new jrt the field of special 
education, a review'of the literature in this field will be helpful in under- 
standing part of the function of NRRC/P. Also'AMS and DABA 1 iterature, wi 1 1 be 

* 

covered for the same reasons. \ 



One must realize the jfst of what Is being proposed here. A national 

project is considering the use of home-made tests (Albeit in the refined vein 

erf CRN) to answer some research questions of high priority in the ID and EMR-" 

fieWs. ^Cannot this enterprise be questioned on the grounds that home~made 

tests — even of the "higher" CRM variety — still have the often-cited flaws, 

of "looseness" in measurement methodology? Are' not CRM tests still 4 plagued 

by subjectivity and possibly by lack of adequate reliability and validity? 

Klein (1970, p. 3) has raised some of these questions, and his arguments merit 

serious consideration: "The . . 4 . use of criterion-referenced measurement would 

be a laudable practice if one knew how-to determihe what criterion objectives 

to'specify, or what level of performance constitutes thei r 'attainment / or how 

td interpret the results if t^he objectives are or are npt achieved. To *; 

illustrate this point, let us suppose that a new course unit in 10th, grade 

biology let to 30% of the students attaining all of the unit's 20 objectives, 

50% of the students attaining 15 objectives, and only 20% of the students 

achieving less than 10 objectives. These resutts look very impressive and ,a 

school off icial. might be very pleased with the effectiveness of the program. 

But would he .still be happy if he discovered that most students could achieve 

10 of these objectives before taking the unit, or that the criterion of 'at- 

tainment was 1 out of 5 items correct per objective, or *that the items used 

to measure an objective were not truly representative of the range of items 
v * * * * 

that toight have been employed, or $hat 80% of the students at other schools 

s v " * \ " 

, (having'stu(3ents of comparable ability) attained all" 20 objectives using a 

'criterion, of k out of 5 items correct per objective?" (p. 3) fc 

KleinfO 97*0) goes on to propose an eclectic test construction model *based 

upon both CRM" and NRM procedures. In effect, -he is aiming his comments at 

standardized test producers and hopes that they will begin to issue instruments 

that embody the best features of both CRM and I NRM. The first step is to specify 
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objectives in operational terms. Klein recommends that each objective embodied 

in the test should have at least three items to mea'sure it. This guideline can 

' i, * » 

be used in a forwacd sen^e to determine how long the test wl'll be, or in a'bacfc- 

warijs- sense to 'determine hbw specific the objectives should be.*" The second step 

is to/lnd test items for ekji objective. Not only, should the items be repre- 

sentdtive, but they should,a?so represent tfiffierent difficulty levels withl.ri 

*■ \ ° . * ' * ' 

^ the objective. The third stepUs to\f ind test items that^tap related objectives 

'."The reasons for measuring ttteslkindi of related objectives are ? that they . / 

(a) provide, information about • the\unant icipated outcomes- of educational programs 

(b) Indicate how close a program (<^r« student) 'came to meeting or surpassing * 
the objectives (a), and (c') show t he leVel at which subsequent educat ionaf treat 
wents should be pitched, (p. ^J 1 * TfYe fourth, step | i s to give the te*t!user 

for each objective measured by the- test a.score and its Interpretation.. "Donald 
Jones (or Program #3) got four of .th'e.isix items correct oq object i ve number 7* ' 
(addition of whole numbers tess" than 100) . t Approximately B0% of- the other 

.students in Donald'.s class did this well. Students of equal ability in other 
classSs^ (or .prX^rTms) only got one-third of the items correct whjch is typical 
of the second graders in this state (i.e., the median score statewide' on- thi s . 
objective is 33* correct), (p. k) ." With respect to writing objectives in, terms 
of difficulty levels^and levels of intellectual functioning, Klein recommends 

'such "atlases" as e/oom (195S) and Guilford (1967). 

It should be noted that the 1 OX is now 'an independent, non-profit cor- ' 
Deration apart from the UCLA Cenfer for the Study of Evaluation which is die 
ected by Dr. Marvin C. A.ikin. The 10X is directed by Drs . W*. James Popham, Eva 

^ * > 

Baker t and John McNeil. This is effective May 31 , 1970. • 

V < 

Mayo (IS70),has argued elegantly for the irfdividual izat fan of inStrUC- 
cy ' 

tlon by means of appropriate CRM measurement. He calls the practices "mastery 
learning" and "mastery testing." Mayo suggests that a new conceptualization of 
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.mental ability is necessary if Jptrue matching of instruction to a chiid's\ 
.specific needs is to occur: "RaUer than thinking of aptitude'as a kind ofj 
ceiling, Carroll (1963) suggested! that aptitude may be related toVhe amount 
of time necessary to achieve mastiry, (p. 2) >l ' i 

The "mastery model" described by Mayo (1970, p. 2) has five features: 
"(a) Inform students about course Expectations, eveh iesson expectations or 
unit expectations, so that they viiw learning as a cooperative rather than" as 
a competitive , ' -Jse. (b) Set standards of mastery fn advance; use pre- ! 
vailing standards or set new ones and assign grades in terms of performance 
rather than relative ranking, (c) Use short diagnostic progress tests for 
each unit of Instruction, (d) Prescribe additional learning for those who 
do not demonstrate initial mastery, (e) Attempt to provide addi t ional time 
for learning ror tl ,se persons who seem to need it." 

In developing CRM (or "mastery") tests, Mayo points out that the usual 
requirements for maintaining an average Item difficulty level of about 50% 
no longer hold; instead, the Scores of pupils will tend to cluste^in a skewed 
distribution around perhaps an 85*~d>ff icul ty level". Educators molded more 
or iess along* traditional test construction lines will be somewhat disturbed" 
in that "mastery tests" will s^em to be almost too easy for a' large portion of 
the pupils. However, this \4 In line with the different cor/cepticr of learning 
that CRM is based upon v Given enough time and individualization of instruction 
pupils should be able to achieve the maj- ity of objectives in bas-ic skill 
areas. This is the premise NkflC/P is working under. "The few who fail the 
item show a clear deficit, and this feedback indicates need for additional 
remedial learning sessions and repeated testing until Items are passed (p. 3)," 

Cox ana iterrett (1970,. p. 227) have proposed a model that combines the , 
best features of NM qnd CRM: "(*«). a precise Vescrfpt ioVo^urr ia/fum oV " 
jectjves and a specification of pupil achievement/ ia reference to these object iv 

**- - * J . " 
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(b) the coding of each item on a standardized test with reference to the cur- 
riculum, and (c) the assignment of two scores to each pupil, one reflecting his 
achievement on items that test content to which he has been exposed ; Y ;he other 
his achievement on items that test content beyond his present status in the 
curriculum or not represented :n the curriculun at all." 

One extensive application of CRM that, has many implications for NRRC/P, is 
the CompreheWvp Achievement Monitoring Project (CAM) of Dwight W. Allen and 

i 

William P. Gor/:h of the University of Massachusetts. O'Reilly, Schriber, Gorth, 
and Wightman ^'(1^9) rave .prepared a lengthy manual that documents the implemen- 
tation of a Complete CAM system. Gorth has had primary responsi bi M t> for de- 
veloping th/s CRM-CAM design. In the introductory part of their, manual, the / 
authors state: "The CAM procedure focuses upon the evaluation of achievement 
bv more or' less continuous monitoring of student performance relative to 
specific course objectives. Unlike traditional evalu-tion procedures which 
generally involve -.esting of students on discreto units of material, CAM generates 
performance data or, all course objectives ... (at several points in time through- 
r out the instructional sequence). The procedure consists of a battery of J 
parallel (or equivalent) test forms which contain items representative of the 

s\pan of the entire course and which are administered to all students, at treqocp* 
j , / 

pre-set, equal intervals. Each form contains an equal number of items (re- ! 

latecj to the specific objectives of a course) for each instructional interval ( 

andUach item is used on only one form.- Items are assigned to test forms by 

idAm sampling techniques and each form is from 10 to TO^l cems in length., 



ran_. 



Thrjigh/the use of* the random sampl ing of test items, it is laterally possibles !\ 



to tjsW hundreds of specific objectives over a group of students.^ Each testy 

form 1 rs similar to a final test for, a course. As the courso-progresses , the* 
°* i ' ' •• - ' \ J 

student should be able to answer .an increasing number of items on the tesj 

fornjs corresponding to an increasing number of objectives mastered. Each s\udent 
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receives a particular test form only once. Over the duration of these te§j:s- 
makes, it possible to sample performance on every course objective over a given 
group of students at every testing. 11 One can think of each test form given 

throughout the course of instruction as representing a barometer, with increas- 

\ ^ 

llingly more difficult objectives corresponding to gradations of degrees. The 
i! i 
[nore success the student achieves, the higher the barometer registers, and 

these readings can be put into a trend analysis over the passage of time for 
each student . Thus, individualized instruction can be" monitored quite inten- 
sively. It should be noted that while' the CAM system was originally operated 
on the basis of group profiles as contrasted to individual profiles, NRRC/P 
will concentrate on the latter. 

Mathematical models are usually rare in the field of education. It 4s 
one thing to develop a psychological theory for a phenonemon to a rigorous 
mathematical modeling process. Pinsky (1970) has done just that with CAM. 
Further, the CAM originators have gone so far as to provide canned computer 
analyses for processing all of the monitoring data (Gorth, Grayson, and Lindeman, 
1969; Gorth, Grayson, and Stroud, I969). 

CAM has been tried out successfully and realistically in a number of 
different situations. While O'JteVny et al. (I969) have summarized the 
technical det.ails of how to implement every phase of a CAM system in future 
instances, Pinsky 0970^ pp. 45-68) h^s given judgmental evaluations of systems 
already in operation. Several pilot loca^i^ns have been selected for CAM pro- 
jects; Duluth, Minnesota (for two consecut i ve\^ears in a high school); Kailua, 
Hawaii (for three consecutive years with 11th and^Zth grade trigonometry and 
algebra, and for two consecutive years with 11th grade American history)-; 
Hopkins, Minnesota (for two consecutive years with 11th grade aj gebra) ; Portland, 
Oregon (for three consecutive years with 9th grade a'lgebra)/ Thus/ one can 
see that CAM has been tried with what are perhaps some of the most complicated 
O < 
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subjects. 

Each test form is called a "monitor.' 1 While each overall operation of 
CAM (with the exception of Ouluth) can be termed a success, Pinsky (1970) none 1 
theless points out some operational difficulties th,<£ one is likely lo encounter. 
A parallel-form monitor is usually given to each child once every two weeks # 
throughout a course. Often a child will not take a monitor at the time it, 
should be taken. Sometimes teachers do not have enough t^ne to make use of the 
feedback data, or, if they do have time, will not put such data to full use. 
Turnaround time for processing the monitor data either by computer or by hi 
may discouraqe some. However, the benefits seem to outweigh by far the d?s<-; 
advantages of CAM. If used cautiously, monitor feedback data can be used to 
program for the deficiencies of a given child, or, at a different level, to 
change the general programming for an entire group of children. Successful 
performance on monitors by certain students can allow them to go Tnto independent 
study or to advance more quickly, rather than be branched back over poorly 
learned material. Further, the program gets out of the old rut of pitting 
student against student and jnakes an individual compete only with himself on 
whatever time schedule he feels he can handle. 

The CAMP Project is one of the few intensive ongoing CRM systems that 
is operating presently. As such, CAM deserves a long hard look at just what 
the operational and, organ izat ional requirements are. The "Guide ..." of -f- 
O'Reilly et al. (19^9) gives such details. 

Cox (1970)has examined some conceptual dlff icul ties 'of CRM with regard 
to technical issues of tjest construction (reliability, validity, and item 
analysis). He .notes a, ^t rend in CRM in that most applications have been in the 
domain of individual ized instruction. Cox begins his discussion of the technical 
issues by distinguishing between CRM and NRM: "When an achievement test is 
constructed as a norm- referenced measure the test items are written or selected • 
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ta maximize differences between individuals* Maximum d Pscr Imi nat ion is de- 
sirable to obtain the variability necessary for ranking individuals. M How- 
ever,, Cox (1970) goes on to describe 1 tern analysi s s techn iques from an earlier 
study (Cox ^nd Vargas, 1 966) that might be "more appropriate for CRM. 11 Two 
discrimination indices were computed for items on tests which had been admin- 
istered both as pre and post-tests. The question of interest was the extent t 
to which the two methods of item analysis yield the same relative evaluation 
of items. One index was computed using the common upper minus lower groups 
technique, thus providing information on how well each item discriminated between 
their groups. The second index involved both the pre and post-test |and was 
computed by subtracting the percentage of pupils who passed the item! on the 

" 1 . 

pre-test, from the percentage who passed the item on the post-test. \Th i s in- 

1 
\ 

dex provided discrimination information between pre and post^test groups, 
indicating items useful for pre-test diagnosis. Results of the comparison 
between the two indices indicated that some items which are highly desirable 
for the pre-pcst test discrimination would be discarded by the typical item 
selection techniques, because they fail to discriminate among individuals taking 
the test. It was concluded that the pre and post-test method of the item 
analysis produced results sufficiently different from traditional methods to 
warrant its consideration in those cases where score variability is not the 
concern, such as criterion-referenced measures. 11 In terms of diagnostic 
procedures in special education, the pre-post CRM item analysis techniques seem 
to hold a great deal of usefulness."" Cox (1970) concludes his examination of CRM 
technical issues by suggesting that perhaps the usual coefficients used to 
measure reliability and validity might not be appropriate because of the lack 
of enough variability. 

The idea of using an achievement monitoring system in special education 
is not entirely new. Kunzelman ( 1 968) described what he termed "data decisions. M 

51) 



43 



i 



8 

(However, to our knowledge, the use of a monitoring system in special education 
that makes rigorous use of CRM in a legitimate way js_ new.) Kunzeiman wants 
educators to engage in "self-help teaching," such as has been developed by the 
Experimental Education Unit of die Mental Retardation and Child Development 
Center at the University of Washington. Basically, Kunzeiman 's system consists 
of recdrding both correct and wrong rates of response for children within a 
teacher's^class for a given content subject area. For example, if a certain 
teacher has been having particular trouble in getting one of her student /to 
master a certain concept in arithmetic, she might decide to use a somewhat 
different tactic of individualized instruction than she had been using./ To 
determine the relative effectiveness of the old and new approaches with *hat 
child, the teacher would have to maintain both correct and incorrect fates cf /. 
response for a few days in arithmetic both before and after the poin/ at which 
remediation tactics were changed. However, the hairy problems of jyst how 
much behavior to sample, when to sample, Jiow to sample, etc., are not dis- 
cussed by Kunzeiman. These are precisely the issues met head-or^by CRM such 
as Project CAM of the University of Massachusetts. ' 

Emrick and Adams (1970) have provided what appear to be sounder cut-off- 
points for making a "success-failure" determination on CRM tests. They use 
as their examples situations from the Individually Prescribed Instruction 
Project (IPI) of^he Learning Research and Development Center at the University 
of Pittsburgh. Because IPI makes heavy use of CRM, such „ew mathematical models 
as Emrick and Adams propose are highly relevant to NRRC/P. The authors state: 
"IPI currently maintains an 85* correct minimum as a mastery criteria for any 
skill test (of which there are over kOO) . Although this criteria does have in- 
tuitive appeal, there is no convenient analytical or empirical justification for 
it. In particular, just as' various skills may differ in level of difficulty in 
terms of mastery, so also might the optimal performance criteria In the test 
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situation vary. It may easily be that for some skills, a test score of 60% is 
indicative of mastery, whereas for others a score of 90% or higher would be re- 
quired. In short, the Issue is not whether a criterion referenced testing pro- 
cedure is or is not appropriate to IPI, but rather how and at what level each 
criterion should be set." Emrick and Adams go on to propose a Bayesian algorithm 
for determining "success-fai lure" cutoff points. 

Lundin (1970) has considered the role of CRM as a means to process evalua- 
tion of xurricular materials still in developmental stages. He describes the 
experiences of the Minnesota Mathematics and Science Center Staff (MINNEMAST) 
in this regard. The research and evaluation^ team of M I NNEMAST. ca^l 1 ed their CRM 
system DRATS ("Domain Referenced Achievement Test Systems 11 ). DRATS qses item 
sampling to extract the maximum amount of in-process information. While Lundin 
fs in back of DRATS in particular and CRM in general, he warns: "if decisions 
based on sophisticated data -do not result in improved student learning, then 
one can do without the luxury of sophisticated data until one develops sophis- 
ticated decision makers." 

Several detailed descriptions of the CAM Project of the University of 
Massachusetts have been given recently (cf. Allen, 1970; O'Reilly, 1970, and 
Gorth, 1970). In particular, some key features of Project CAM in terms of CRM 
have been brought up by All enT Gorth, and Wightman* (1970) . The authers state: 
"CAM measures achievement in a systematic way throughtout a course in the second- 
ary or elementary school. It is comprehensive in two dimensions: 1. Time 
because achievement is measured throughout a course and t ,2. Course content be- 
cause achievement is measured on all of the behavioral objectives specified for 
a course at each time . CAM uses several of the most modern techniques in ed- 
ucational measurement to obtain the goals it sets for reliability and validity. 
The techniques include item sampling which has recently been developed by 
Frederick Lord and longitudinal testing which has often been recommended to 
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measure change or growth. Both of these Ideas have been tied to computer pro- 
grams for rapid analysis and reporting of the results to students, teachers, 
and administrators." 

Another major set of benefits derived by CAM, claim Allen, O'Reilly, and • 
Gorth (1970), is that: "At each test administration, performance or objectives • 
not yet taught is pretested, performance on objectives just taught is immediately 
post-tested, and performance on objectives taught -ear 1 ier in the course is- 
measured for retention." ' ' 

Flexibility is another major virtue of the CAM system: "Monitors are 
indented .to be short tests, perhaps ten to thirty items. Whether or not a 
single form covers all objectives for a course is a function of the proportion 
of objectives to i teirts-perform. It may be necessary to randomly sample 
(without replacement) the objectives, before doing the same on the test items 
for each selected objective." 

9 t 

Allen, O'Reilly, and Gorth (1970) describe several different types of. 
feedback, at either the individual or group level: "For individual students : 
After each administration: 1) total score on that and all previous administrations, 
2)'a graphic presentation of the above, 3) a i r igbt-wronV irrdicatTon" for each ' 
item on the monitor, coded by the objective represented. At the" end of the 
course: 4)_average scores, across all monitors taken, on items categorized 
by use into three groups -- pretest, immediate post-instrudtten and retention of 
varying lengths of time. For whole group or subgroups (e.g., one classroom, » ' 
highest and lowest quartiles): After each administration: 1) percent answered 
correctly out of all iterrs~"a«<oss^al 1 monitors, for each objective. Period- . 
ically, as desired (e.g., every 3~5 administration): 2) trend data, or achieve- 
ment profiles, for total score and for each objective. At the end of the 
course" 3) same as number 4 under individual students, k) item analysis (using 
whole group only), treating each item in three separate ways,' by its three ' 
/unctions pretest, immediate post-instruction, and retention measure." ' 

r 
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CHAPTER VI, 

A DETAILED DESCRIPTION ,OF 
A CRITERION-REFERENCED 
MEASUREMENT SYSTEM 
THAT WOULD BE SUITABLE 
FOR SPECIAL STATE-CONNECTED 
■PROJECTS, SUCH AS THE NATIONAL 

REGIONAL RESOURCES CENTER . ' • 

OF PENNSYLVANIA • 

* % * 

i 

Introduction 

From the prevtous chapter, the reader should now have a command over 
what the concept of criterion-referenced measurement (CRM) means and what 
some of its advantages*and disadvantages are. Whi 1e jform§J--pi t 0gr«S?n evaluation 
In the norm-referencedjDfiksJJ^se^ Is the most feasible route to 

ToTlow in implementing a large-scale accountability system, for any intensive 
examination of exactly what is happening in special education ^classes , the 
author feels CRM is the only real artswer at present. For these reasons, when- 
ever special projects are run that are connected with the state Bureau of ^ 
Special ^Educat ion or, for that tnatter, are run strictly on the local level, a 
detailed description of how the. projected CRM system will operate in the Eastern 
Suburban Division of the National Regional Resources Center of Pennsylvania 
(NRRC/P) is given here. It should be noted at the outset that the NRRC/P 
CRM system can be modified to accomodate the specific needs of any program. 
'# 

BACKGROUND ON NRRC/P * 



The National Regional Resources Center of Pennsylvania began July 1, 1970, 
^Ith one year of planning. NRRC/P is a cooperat^f^effort that combines the 
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resources of several existing public special education agencies in the 
central- and eastern third of the Commonwealth of Pennsylvania. The principal * 
Investigator of this federally funded, long-range projecr is Dr. William F. > 
Ohrtman, Director of the state Bureau of Special Education in Harr'isburg. 
NRRC/P is devoted to intensive study of the efficacy of various programming^ 
techniques used w^h Tearning-dtsabteiHchlldren of elementary school age 
(in the sense of the national definition of learning disabled). Several: ex- 
perimental' classes are being established in urban, suburban, and rural^are 
In both the* central and eastern portions' of the Comome^r^^i\^T than * 
concentrate on global programming qu^sH^DS-of^TTong- range nature (which 
program evaluation^efiPi^ten^ "tTmask the more crucial features of why or 

^otaTpTogramming technique was successful), NRRC/P will be looking 
intensely at* how well small manageable-units of instruction work with certain 
types of learning disabled children. For example, if one unit of instruction 
were to have as its primary objective the mastery of a certain family of words, 
then NRRC/P wants to know: (a*) what level of criterion mastery can be ex- 
pected over a specified period of time with programming approach A that is 
used to teach the family^of words, (b) how the level of success achieved with 

approach A compares with approaches B, C (c) what different levels of 

mastery are possible with learning disabled children of different impairment 

levels v under programming approaches A, B, C and (d) what cost-effect rve- 

ness factors enter into approaches A , B , C At first glance, this list 

'of questions would seem to be an over-ambitious project. However, the 
criterion-referenced measurement system of NRRC/P has allowed for* the study 
of all these problems. This CRM system seems to hold several useful implications 
for any_area of special education; let alone the learning disabled. 

In dealing with learning disabled and minimally brain damaged pupils, 
individualization of instruction is of utmost importance. For this reason, 

o3 



) 48 

custom, p.ycboeducat.ona. pro^in, wi „ be used for eacb ^ 
_ «~ev.r. before de , isi „ 9 ,„ indIviduanjed prescr!pUon ^ ^ ^ 
—f- form „ f sma1 , group |nstruct(on wm be used ^ ^ 

fntensive, Individualized help. 1 
The basic program operatio „ of nat ,2, projec( wM| ^ ^^^^ 

•'- tWOmaJ ° r C °* POne ' ,tS - - — ,9ro U p ,„ s «ruc t ,.„v m for. 

the central component or track about which all :„r ,. 

out which all individualized efforts will 

be oriented. The intent nf t-k. 

•ntent of the, project is to keep the child in the main 

•nstrudtional track whenever possible Th„ c ., 

possible. The small group instruction forms 

the regular education component of the project. Second, whenever a did 

*..«. to run , oto severe educati0 „ a , )lff|a|t|(i tba( canno( ^ hand|ed 

re9U ' ar SnB " •""•Wr-* be o i„ be sent to the 

resource teacher for one-to-one Ind.vldu.^ „ 4tp , or^T.TnereV^roe- 
teacher „,„ try t0 de al wlth the ^ ^ ' 

The lratnlPtl o Ml Mquence , whe(her (he regu|a>> smai| _ group mf ^. 

" -<viduan Z ed- P reserip<io„ situation, „,„ 

divided into "instructional modules." Each mnd„l • 

Each module ,s used only over a relatively 
short period of pernaps ^ „ eeks _ ^ ,^ 

-d around a set of hi 9 h,y specific objectives stated in measurable behavioral 

terms. Each child's achievement both before and ,ti~ ■ l 

in oerore and after going through the \ 

"odule is measured by ..monitors,. „hich are specie, types of 
» the CRM sense,. The pre-monitor is „ sed before the module is entered upon 
-d the post-monitor is used after the child has computed the modu,e. If the 
chi.d demonstrates inadeguate achievement 0 „ the post . monitor> he 
trough, i„ t0 contact WIth (h . e resource Uacher ona . to ona individuaU , ation 
or instruction. „ ith , he resource ^ ^ ^ ^ ^ . 

through the instructional rodule which he has not stored in f he regular in-' ' - 
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stru ct,ona, mainstream. After using s.-.l.r ferns of post-monitors to 
chec* W «oon the child's understanding, of the module w.th which the.re- 
S o V rce teachers given Hi. he,,, a decision is ..«.-«., « » 
to send the child back to the instructional mainstream. ... , 



THE CRM SYSTEM OF. NRRC/P 



• T , carr y out such an instructional programming .*..«*. * corresponding 
' m ea_t and evaiuation.yste. must h. devised. Whiie standard tests . 
-8,,, continue to he used as part of the usual individual psychological' 
screening evaluation given ».1f chiidren in the .nationa, project, the ■ 
„«,bllt«, of. such tests for measuring within any glven-pupi, » • 
,„„. limited, first, any standard^ test seiected «,„ usuaiiy embody ■ . 
' 0 n,y"verygloba, program objectives;^ specific instructional obiect.ves 

of a certain .odule „„, oniy he ref.ected occasionaiiy within , standard.zed 
7 tM , Second, hy their very nature; standardized tests employ norm-referenced 
^ure^n, ' ,n other words, a child's performance is Judged re.ativ, to 
oormative data gotten from large samples of n^ ch, ,dren. .... « -T 
be of 9 Veat use in determining ; initia, placement of a child in a specal/ . 
■ cla ,s ,n terms of Ms deviation from the standardized data of norma, ch.ldren, 

> f •lltMe'use in gauging the actual progress of a par- 
such compSTlSOT.s afe of litHe use in 9 s a , 

. , Third NRn or standardized testing, 

ticular child relative to his potential. Th.rd, NRK _ 
' d oesnot readily lend i.tseif to measuring change In a ,al Id manner w an 
laments on the same materia, or module are r,eeded,\ other words a 
' • M „ becomes attuned to the 9 uestions A the test itself after rece.v, more 
tha „ one administration of the instrument. „ for. these reasons , N^.U 
^on.y use standardized tests in ,hc,sua, screening . 
" classical. „*.. program evaluation. but wl „ emphasize,, much .re appro 
pr late measurement, system known as CRM. ' 6*6* 
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For. any giv^n Instructional module, criterion-referenced tests will 
be used. Such tests will be known as monitors. The basic advantage of CRM / 
is that it gaugescthe progress of a pupil relative to his own potential in 
terms of predetermined goals of performance levels. Thus, the inappropriate 
comparisons that would occui; in fitting an exceptional child's progress / 
against that of normal ChMdren^ that is, NRM, are avoided completely with # 
CRM. For these reasons alone, monitors or tests constructed and interpreted 
in the criterion-referenced $ens| are ideally suited to measuiflijg change in 
children as" the result of highly individualized educational" prescriptions. 
The particular objectives'of any given instructional module are reflected in 
the test items of the monitor for that module. The test items are constructed 

In accord with the best measurement theory available. Both ; the teachers who 

i 

use the morfitors and modules and the measurement specialists who help build 

■ p « 4 

them are involved in test item selectibn and pons.t ryct ion, ' Because attiieve- 

itient or performance shqujd be measured only relative to t^e child's own 

initial baseline on *that module, both pre- *arjid post-monitors *wi 1 T tie used foe 

a given module. Further, if a chilcTdpes not achieve on , the post-monitor the 

degree of attainment that his potential and initial levei On the pre-monito/ 

suggest, then Tie will have to be recycled. .through the module in questibn with 

different Supplementary, modular material. fAgaio, however, the only way of 

• ' * ' ' / 

measuring how* successful the remediation wasr is to giv£ the child cf different 

m >\- i • " " - / « 

but similarly appropriate post-mon itorr "Thus, several/ equivalent forms Of 

monitors must be constructed for, any given module. T^e basic functioning^of A 

ihe CRM system is represented in Figure 3. 

NRRC/P OPERATIONAL CRM MACHINERY / 

When one begins to delve into th§ details of such d criterion-referenced 
monitoring system one of the first questions to bte answered is how both" tho 
monitors and instructional modules are cpnstructe^l, since these two items arc 
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the heart of the program. One logical way to handle this task would be 

! • ' f\ 

f first to examine the range of abilities and deficits iry the population of 

I learning disabled pupils being served. With such a survey completed /the 



educational programer/prescriber can project roughly /just what total range 
off subject content areas can be expected to be mastered throughout thenar. 

Then, following this line of reasoning, during the/summer before the start 

/ 

o^. the program, a task force of teachers, administrators, evaluators, and 
other specialists would /work feverishly to complete a sufficient number of 
sequentially related in/structional modules thay' would t^ke care of the pro- 
jected range of all pupils, along with corresponding sets of monitors. How- 
ever, NRRC/P will not /elect to go this route./ First, the job of trying to 
anticipate how far feach child will go throughout the year and then building 
enough modules to cc^er this wide range, is /far too compl icatedtto accomplish 
with adequate qualify simply during the surrtner. Second, devising all the* 
modules' and monitons ahead 01 v 'me tends to lock staff members into a "canned" 



sjet of programs thuit will tend to stifle yn-process improvements dictated by, 

j / * 

spontaneous problems that always seem to^/arise. Thus, a more flexible monitor 

mpdule production system is needed for NjRRC/P. 

/ 

Before describing how NR~VP plans/to devise 'the modules and monitors, 

the reader shoul3 be\aware of how pupil/s relate to each other as they move 

I ( \ 

friom one instructional module to the n£xt. First, let the reader assume that 

! ( \ / 

alll pupils in a given c^lass are able t/o be handled adequately by the regularly 
assigned teacher in the\modified smal i-grouj3 sett ing. Nonetheless, it must 



j 

be jborne in mind that eaoih child is treated as an individual and is allowed 
to taove at his own rate through whatever module appears to be appropriate to' 

i \ 

himjat his stages of developmental r/eadiness, existing knowledge, and ability. 

1 \ / 

In cither words, at any given point in time, each child probably wi]l be work- 

! \ 

ing on a different module. Howeve^, eventually every' rhi Id will pass through 
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some of the same modules, since they are devtsed in a sequential task frame- 
work. Also, since a given rpodule will always have its corresponding monitors 
used with any child that passes through the module, eventually each module 
will have comparable data obtained from every child in the class. Of course, 

with some of the easier modules and some of the. more advanced modules, only 

i 

a few children will ever work their way through them and resulting data will 
be sketchy for the class as a whole at these points in the instructional 
sequence. But then this is the nature of individualized instruction! The 
sequence of steps that any child who is functioning adequately within' the 
regularly assigned classroom setting would go through is given in Figure k. 
Before returning to a discussion of how monitors and modules are constructed, 
let the reader consider the occasions when a child becomes so embroiled in 
his learning difficulties that he must be referred to the resource; teacher 
for several days or even longer. * * / 

Highly specialized, individualized help must be provided to /any child 
who runs into severe educational probl^ms^ In general, a resource.. .teacher- 
consultant will be called in. There are at least two ways in" .which this can 
occur. First, an itinerant resource teacher wiU'be brought into the child's 
regular class to work with him in that setting. Second, the child will be 
taken out of his' regularly assigned class and sent to a special resource 
teacher room for a certain amount of time each day. Regardless of the par- 
ticular method selected, the relationships between -the resource teacher 
consultation and the regularly assigned small-group instruction (individual- 
ized ''mainstream 11 ) are represented in Figure 5. One can see how the decision 
is made as to when the child returns to the small-group instructional setting. 

Next, the matter' of how the monitors and modules might best be constructed 
needs to be considered. As in the first method described above but rejected 
by NRRC/P, a survey of the range of ability and weaknesses of each child 
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monitors begin to approach^real i ty when the specific objectives (stated 
in measurable terms) of the module in question are ^eed upor>. A large 
test Ttem pool for all objectives is constructed, with individual test items 
coming, from already existing sources or being made on the spot. At least 

three parallel forms of a monitor are construpted for a given module. 
The first parallel form of the monitor serves as a pre-test, the second as 
an immediate post-test, and the- third as a second post-test (if the child has 
to be recycled through the same module again because of inadequate criterion 
performance on the first post-test). The three parallel forms are<*xons true ted 
by randomly assigning test Items for a given objective throughout al 1 three 
forms. This proces's is repeated for each objective in the module. 

Another logical question one might ask about specific operational pro- 
*cedures concerns personnel for carrying out the monitoring process. Jrfhile it 
will be the responsibility of administrative, staff to provide the teachers 
with raw working materials — modules, monitors, instructional materials, etc. 

£ 

teachers themselves will be required to administer each monitor to each 
child wlenever the appropriate time arises. Only the teacher will be able 
to coordinate this activity most efficiently* Further, the teacher will 
be required to grade the monitors herself and to record all data on special ly 
devised recording sheets^ In this way, the teacher becomes intimately in- 
volved in diagnostic teaching and provides herself with immediate feedback on 
how each child is doing with the module he is currently involved with. Indeed 
the big advantage is that the<teacher is forced to look at just how each 
child is learning; in other words, accountabi 1 j ty (see Proger, in press) 
becomes reality. On each teacher's recording sheet, data will be kept on 
monitor scores (in relation to predetermined criterion levels of success 
custom-made for each child), time needed to go through the module, open-end 
comments of the teacher, etc. 
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Every week an administrative supervisor will collect a carbon copy 
of^the data recording sheet of each teacher. This- carbon copy is returned 
to the evaluation department for processing and analyzing. All data will 
be entered on computer cards according, to a predetermined format. Eventually, 
after a large number of children go through the same module with the same 
corresponding monitors, evaluation personnel will be able to draw some 
generalizations about how certain types of children learn the modular 
subject-matter material in question. Reports will be generated and dissem- 
inated at local, state, and national levels. 

» • 

SUMMARY 

An individual achievement monitoring system for special education has* 
been described. The criterion-referenced nature of this monitoring system 

r 

has been explained, in distinction to the usu,al norm-refer.enced measurement 
procedures of standardized testing. The details of this-^RM system as 
projected for use in the National Regional Resources Center.of Pennsylvania 
project have been outlined. 
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'MACHPNERYV FOR IMPLEMENTATION . 

OF THE FORMAL PROGRAM 
EVALUATION SYSTEM FOR SPECIAL ' . 

-EDUCATION . AT THE STATE 
■ j ./ LEVEL: PERS6NNEL ANO 
! DATA-BANKING ACTIVITIES 

- - _ 1 o 

\ 

> * STATE IMPLEMENTATION MACHINERY 

_ 

The previous section has dealt with regional implementation of the - 
formal program evaluation model. 1 personally fee, that local consultation 
agencies are definite,, crucial to any statewide system. However. regard- 
.less of how local speciaj education personnel' obtain expert consultation on 
how to carry out their particular program evaluation activities, all data 
collected must be asslm, lated M. the state Bureau of Special Education,, /• 
ana,yze-d, interpreted, and' used for policy decisions wherever appropriate. 
.,„ this section, the implementation machl nery at the state level wlll^e. , ,, 

z 

considered. - e 

"First, the state Bureau of Special' Education would need some specialized 

eva.uation and measurement personnel. Ideally, such peop,e wou,d bewell- 

versed in evaluation and measurement methodology, statistical analysis, and 

data -processing and computer programme . There are two primary sources for 

obtaining such people.' First,' the United States Office of Education has 

turned out hundreds of such specialists at the doctoral level through, the 

Educational Research fellowship Training Program. .Second, several colics 

and universities have Master's programs in educational research. There 

should.be no difficulty in getting hold of qualified personnel. 



The>next question to be resolved concerns -where such program evaluation 

specialists would be housed and f rom what budgets they would be paid. There 

is- no doubt that major planning, policy, and operational decisions should be ' 

made by evaluation specialists who .have had extensive training and experience 

at the doctoral level. Such .personnel; need not necessarily be paid by or 

housed within the Bureau itself. Evaluation specialists of doctoral calibre 

--ca^be-^otten on a constant basis (paid or non-paid) from un i vers i ties' or 

special federal projects. In fact, no doubt sptecial federal projects could 

h > 
be initiated whose.sole function would be ta field test the feas ib? J T ty of * 

% such a program evaluation model; the qi rector of the Bureau could 'be made * 

Principal Investigator so as to maintain Bureau control over the evaluation * 

4 

activities. '"'*•« 

Once expert measurement personnel at"the doctoral level jhave besin obtained 
a part-time consultant* basis or fuir-time basis, more routine operational 
details could be accomplished . by measurement special ists-et [the Master's level. 
Again, salarjes and facilities could be handled directly oufof the Bureau, 
or^special agencies outside the state Department, of Education could be funded 

(peVhaps federally) to handle such evaluation tasks of a day-to-day nature 
. , . I ' 

with regard to data processing and,analysis. 



» »After one has considered the question of personnel in depth, he must 
next deal with facilities' for storing and, analyzing data Jnd information ob- 
tained from separate evaluation projects. Such computer facilities already 
exist within the state" Department of Education and a largeWiumber of county 

or intermediate unit operations. jCooperative arrangements could'be explored 

' ! i f / 

at the state and regional levels. Possibilities in collegej and ;unlvers i ties 
would also be considered/ Keypunch i og facilities devoted exclusively to J~ 
statewide program evaluation system in special education are a necessity. 
In collecting the data for storage and analysis, a feasible, standardized ■ 



format for inputing the data must tje devised J All such operational de- 

• • , . I - ' ^ 

tatls could be handled by measurement special i sts and other consultants *\ « 

* . I ' ' • 

of y whom the Bureau of Special "Education wou*ld want to avail themselves. 

. ° ~ J * / / ' 

tn terms of both personnel and faci.l it^fes for processing and analyzing, 

the obvious suggestion wou~ld be to use easting arrangements, fthyever sat- 

fsfactory* withff) the state Departmen^of Education and state-affiliated 

% * * ' * 

organ izSt iorfs'.ancj projects. Nonetheless wherever currently available 

v r , - " * . f ' *\ ■ ' 

resq0>ce$ would clearly not be able. to handle the tasks of a formal program *' ' 

*' • / 

evaluation system!, then specialized personnel, and facilities must be ob- 

tafned that would be devoted solely to special education purposes. 9 

This chaptef ,!w? 1 1 be concluded by devoting some detailed comments to 
the concepts of data-tanking. To give' the reader some working ideas of just / 
what ^data-banking activities consist, *a review of the literature is presented. 

One of the key features of thfe NRRC/P* research program will be its «r '% \ 
emphasis on CRM, although- not to the complete exclusion of ! standard ized or 
NRM testing, tn order to arrsWr the types of questions posed !at the start 
of this paper, it is imperative that a large amount of information be 
stored so that it can later be retrieved for various types bf j pool ing oper- 
attons yia different analytical ^strategies. The data bank l(DABA) concept 
is a vehicle, for such activities.^ A brief review of, the. technical ljterature 

; | 

in this field will be helpful for those interested in the functioning of NRRC/P. 

\ " 1 - f < , " * 

The need for a 0ABA in a large project engaged in massiye testing programs 

is apparent and yet feasibility studies with the DABA idea are scarce* Austin 

(1970, p»9) claims: "Those of gs who are concerned with the processing of 

large scale testing programs have, in the past- few years, made considerable ' 

♦ j% * 

! prdgress in the area of high-speed test' seeding. ... In the area of record- j 
•keeping,' or data banking, we have done little, 11 

Since the establishment of 'a data bank (DABA) is one of the primary 



enabling objectives; of NRRC/P In working toward* the questions posed at 

the start.of this paper, a few comments on operational difficulties are in 

order. Fascione and Penry (1970) describe the experiences of the School 

t 

District of Philadelphia in trying to generate : a" data bank. They point out 
that/not many educators, let alone other types of technicians, are familiar 
, with just what DABA implies. Further, they warn against getting up in the 
Sometimes grandiose ideas of systems analysts , "computer workers, etc. 
fasciqne and Perry (1970)' suggest.- "Primary responsibility for syttem ^ 
design and implementation both ^hould be placed somewhere in the organiza- 
tional structure other than in the data processing area. This crucial 
initial step helps to retain the project's feciis on the human aspects of the 
problem. Another way of describing the benefits, of this approach is to say" 
that it helps prevent the tail from wagging the dog, which results from the 
^»data processing technologists' natural inclinations to (1) have the late'st 
and most sophisticated equipment; (2) j just if y, the computer's presence by 
utilizing all its capacity, (3) set *the actual goals for the system rather 
than have them set by administrators'." Inline with '.his philosophy, 
\ Fascibne and Penry recommend that DABA managers set out to produce immediate 
benefits for those to be served, rather than harping on what tremendous 
things wi-11 happen with long-range goals. Some immediate benefits that re- 
sulted." in the School District of Philadelphia were: (a) compilation of student 
attendance and background lists for administrators / (b) capacity to conduct 
longitudinal studies of certain child/en-, (c) capacity to draw more valid 
and" representative random samples of certain types of students for ongoing 
" evaluation studies, and (d) keeping track much more efficiently of standard- 
• ized testing results. Fascione and Penry describe their DABA system as 
being based plainly upon cards rather than tapes. Each child is kepf on a 
card, With such things\as* background characteristics as name, birth date, . 

82 ' ' 



sex,. 10 number, address, home telephone number, school grade, room assign- 
men*, etc' Arc up^-to-date punched card deck is ^maintained within each school- 
building. Ever^two months^ these card decks ate re-pocessed in entirety 
To give up-datecj lists of students. * 

The CAM Prpject at the. Univers* :y of Massachusetts also maintains what 
•s, in effect, a OABA (cf. Gorth, Grayson, Popejoy, and Strowd, 1969). With 
a CRM system such as CAM, s jge amount of performance data is obtained with 
respect to individual test items, groups of test it:ms relating to one be- 
havioral objective, groups of bfehavioral objectives relating to one large 
program objective, and — /turning along. a different dimension of data 
generation data on individual .students, groups of students, etc. The com- 
prisons ate almost endless. Thus, the needs for a highly efficient DABA are 
evident. • 

r erhaps ^he classic example of the sophisticated DABft is that associated 

with Project TALENT (Flanagan, Cooley, Shaycroft, Hall, VanWormer, Wingersky, 

__, ^, LI1 JI . & 

and-Holdeman, 1965). M A1 though the term 'data bank 1 is s°ometimes used to 
'efer to any accumulation of data, it is important to recognize that some 
accumulations wiH be more useful than others. It seems preferable to reserve 
the term 'data bank 1 for data collected .with soi.<a over-all basic design and 
Tor which research .uses were originally considered. This does not necessarily 
mean that the data must tfave been collected solely for research purposes, but 
it does mean that no sound research principles were violated in the data- 

~co11ectT6rT~process. (p. 1) M 

Managan ef al. (1 96S) list seven features that they consider essential — 

*Zb\a meaningful data bank: (a) the data gathered must relate to/ a population 
of students that has been previously defined in a deliberate and careful 
manner with randomization present, rather than inadvertently defined pop- 
ul at ions; (b) As many variables as possible should be tapped; (c) !f 
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possible, a\*arge number of variable^ should be measured on a lar^ sample ; 



(d) data in the bank should be eaj>?ly\accessible; (e) all data collected 
should be comparable with respect to t^pe of instrument ,"t I me of administra- 
tion, conditions of measurement, ?tc; tf) data-should be organized within 

the^bank so that complex relationships can be derived by computer; and (g) 

j \ ' , 

data recorded at different points] in time\should be interrelated for the 

] ' V 

same students, and any factors th^t may ha^e affected such relationships 

. 1 > ' \ 1 

should also be able to be tied in f r Flanagan\et ah state: "The administration 

of the Project TALENT tests to nearly Jjalf a pillion students fn over 1,300 

schools constituted the first of .several phased of data collection. More 

than 2 f 000 items of information per student and\ 1,000 items per school were 

- ' ■ ! \ \ 

collected. Some of th,ese have been summarized \p the form of test scores ' 

— — . '. : \. \ 

and others have been transferred directly to magnetic tape, currently stored 

■ ' \ | 

•$t the Computation and Data Processing Center of ^he Universityj of Pittsburgh. 

f \ ' 

A series pf follow-up studies f^as been planned for 1 one, 1 five, ten, and twenty 

years aft^r each of . the (four) classes In the sample graduates frorp higher 

\' ' ■ ■ ! 

school, (p. Thus, lopg-range career patterns Will be able to be related 

to^original patterns of education, as well as a host of other variables. 
This was one of the main objectives of Project TALENT. | 

3 I 

i i » t ! 

The NRRC/P ,DABA will hardly be as, extensive as (Project TALENT '-5 DABA, but 
the concepts of operation will be highly similar. Tlhe ideas of collecting 

periodic information on what is happening throughout j t^e remediation process 

\ \ 
used with the student and frying tb relate such data|to different strategies 

1 j » I 

of remediation, as well as background variables on tfe^e student, wil| be a + 

' , < j i ' 

primary goal . ! 

One fcf the most exhaustive studies of the educational* ♦DABA idea was the 

series of Reports contained in Carroll et al. 6965).! A series of conferences 

held by the Harvard Graduate Schdol of Education debased issues connected with 
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the DABA concept, fn one report, Benjamin Bloom (pp. 30-37) discussed some 
problems: (a) the originators of a particular DABA determine at the outset 
what types of information are most important; (b) whether the DABA should 
act as a service center or a research center; (e) the possible conflicts 
between individual research. efforts and team research projects; (d) the 
possible invasion of privacy; '. <•) whether DABA's evolve over time with 
improvements or maintain thelr'origlnal structure. As an example of an ex- 
ample of an existing DABA , Bloom mentioned the International Educational 
Achievement Study Test result-. In mathematics were Col lecteo^r- 200.000 
students of ages 13 to 17 or 18 from the United, States , eight European 
countries, Israel, Japan, and Australia. • 

! ,n the DABA report of Carroll et al. (1965), a conference of school 
superintendents resulted in recommendations of the types <* questions they 
would like answered. Two examples (p. *5) -e (a) "What is the' rel at ionship 
between subjects or courses of study pursued in high school and the occupation 
the student enters after graduation?" and (b) "What preschool experiences 
best prepare a child for school experience, especially with regard to read, 
in g and motivation to learning?" riove^r, the superintendents pointed out 
th,t a DABA has inherent limitations because "for practically every question 
considered to be of great importance, the data to answer the question were 
virtually inaccessible. . ..inaccessibility .does not .mean that data do not 
.exist', but ratlfer that the effort required to retrieve the, or rearrange * hem 
.anually would be so great that the data are, for.all practical purposes, not 
at ; ; 11 accessible (p. 1 »6)" « 
V " • Another example of a functioning DABA given by Carroll et al . (J9&M 
, s \he New England Education Data Systems (NEEDS) . The report states: "th. 
realties of running a school - such things as production of schedules, re- 
port cards, class lists, attendance .records - because of their immediacy, 
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require attention and time. NEEDS seeks to provide ways to reduce the 
time and attention taken by these clerical tasks, thereby releasing *ne ad- 
ministrator and his staff for more important, creative work sudTas assessment 
and reorganization of the curriculum (p. 73). 11 NEEDS includes nine communities 
in Massachusetts, two in Connect i cut, . two in Vermont, one in Rhode Island, 
and one in New Hampshire. The four divisions of NEEDS are (a) data process- 
ing services, (b) *operat ions research and development, (c) in-service training, 
and 1(d) basic research and formal instruction. The basic services offered are 
(a) file creation' and maintenance, (b) scheduling support, (c) mark reporting,' 
(d) automated attendance, and (e) test scoring and analysis. 

A second major illustration of a functioning DABA found in Carroll et al . 
(1965) is the Iowa Educational Information Center ( IEJ C) , sponsored by the 
College of Education at the University of Iowa and the State Department of 
Public Instruction, , The data,banking and data processing activities of IEIC 
are similar to those of NEEDS* * K h * 

The report, of Carroll et al. (1965, -p. 20) recommended that at least three 
types of data be considered for any DABA: (a) "demographic data (age, sex, 
socio-economic status of parents^ and other data which are essentially socio- 
logical), 11 (b)* H descriptive data (class size, pupi 1 -teacher ratio, and other 
summary statistical data which describe characteristics of the school or the 
student population, personnel, etc.))," and (c) "Evaluative data (tests student 
grades, and other/data for evaluation of student progress, teacher success, 
curricular validity 0 , etc.). 11 The report concludes with an extensive bibli- 
ography of - DABA literature. 

Miami (Dade County, 1 967) has taken the lead in a statewide DABA operation. 
The system will (a) provide teachers with periodic background reports on 
students, (b) help curriculum planners evaluate particular programs,, (c) es- 
tablish mutual feedback between, the schools and colleges, ,and (d) provide 
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guidance counselors with student information reports. Using Miami as a 
pivot point, four different counties in»F1orida trie'dout different techniques 
associated with a DABA system to test the feasibility of a statewide DABA' 
operat ion. 

Other examples of the DABA concept are readHy found and will not he 
detailed here. The value of such systems, when and if they become sophis- 
ticated enough, is that any type of remediation used with a student can be 
evaluated in terms of the effects it had 'on the 'student relative to other ap- 
proaches, (cf. Grossman and Howe, 1966; McComb, Miss., 1967; St. Louis Park, 
Minn., 1967; Sacramento, Calif., 1966; Edina-,- Minn. , 1966; Mount Clemens* - 
Mich., 1967; Davenport," lowa, / 1966; Lincoln, Nebr., 1967; Buffalo, N.Y., 1966; 
Eugene, Oreg., 1966). 
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"Format ive M and "summative 11 are merely synonyms for 
"process 11 and ••product" types of evaluation, respectively. 

c 

The only ways in which the models still aid me as a pro- 
fessional evaluator are: (a) to show to a client during 
fape-to-face program evaluation consul tation what the 
general steps in the process are, and (b) to use as a 
discussion device during workshops on prografo evaluation. 

Of course-, these assumptions are always open to question. 
The goal is to choose meaningful classification schemes for. 
the children so that th«e assumptions are at lease approx-- 
imated. For those highly dubious about these assumptions, he 
should ask himself what the alternative would be to evaluat- 
ing program without falling back to the case stiidy method in 
and of itself. 



10.1 



APPENDIX A 



SUGGESTED PRIORITIES 

OF DISSEMINATION 
OF THIS FIRST DRAFT 



83 

/ 



l* , Bureau of Special Education (Dr. Ohrtman, 
Dr. Cogen, and Staff) and Bureau of 

* 

Qua 1 ity Assessment / 

«• 

Jl Specfaf Education Experts from Teacher 
Training Institutions across State 

ILL* pane ! of, Measurement Experts from Across 
Nation 

• • 
V Major special education administrative 

n 

"personnel from IV's; private schools, and 
parochial schools. a 

Selected groups of special education 
teachers. 
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appendix b 

guidelines for professional 
Usage of accountability 
data at local or state 

1evels with either total 
program evaluation or 

INDIVlbUAL ACHIEVEMENT 
MONITORING 



At no time is a teacher or administrator 
to feel his job. is in jeopardy because his 
children appear to be -doing "poorly" relative 

* • 

to some predefined criteria. The data gathered 

* ». . 

at local or State levels is for use only by those? 

"" * ' i 
respective officials for determining whether 

certaln?p>ograms and techniques "(not people) 
will be discarded or, modified. 



In any place where an accountability system, is to 
be implemented, before a system is allowed to start, . 
intensive in-ser.vice of aH. faculty ^administrators : 
and teachers) must be undertaken to avoid any mis- 
interpretations. Complete rapport of staff with - , 

' the objectives and philosophy of accountability is^ 

' * : •-. • - ' 

^ essential . -V ' 

', - . r " 

'• The- state Bureau of Special Education must exert a^ ^ 
leadership role \n serving as watchdog over the use 
of program "evaluation data at the local and state 
levels. The state must take appropriate action 

wherever misuse of data occurs. 

« - * 

.Only those professionals subject to. the control of the 

Bureau of ^Special Education (or those delegated by them) 

will have functional access to the data banks. ■ J) 
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APPENDIX C 

'pOSSJBLE INTERRELATIONSHIPS 
AMONG EXISTING AGENCIES 

It 

. IN CARRYING OUT A 
STATEWIDE FORMAL t PROGRAM 
EVALUATION SYSTEM 
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APPENDIX D 



OUTLINE OF OPERATIONAL 
STEPS NEEDED TO IMPLEMENT 

A STATEWIDE FORMAL 
PROGRAM EVALUATION ' -SYSTEM 
IN ITS FIRST YEAR 
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Agree 'to commit any given speglal education program's personnel 

to collecting data on a regular basis ~ at least twice a year; for 

<t — , , f . „ , 

certain types of performance, at least three times a year. * (Most 

* ., 
data will be collected by teachers.) 

Step JL is accomplished, minimal common prog ram objectives must be 
established that all Children can be measured on. This step must be 

distinguished from individual pupil objectives that a highly specific 

■ " . • 1 

educational prescription would embody. Most curriculum guides have I 

program objectives directly or implicitly stated, altho, they are not 

always as operationally stated as they should be. Because program 

objectives are a lot easier to agree upon than are individual^pupil 

objectives, Step 7T should be able to be achieved, quickly and without 

much trouble. r 

♦ 

Select tests/ rating scales, informal inventories, etc., that are- 
readily available and which yield data in terms of developmental norms 
(developmental ages, mental ages, grade equivalents, ets.) that can be 
interpreted easily by workers in the field. This does, NOT preclude 
also using locally derived measuring instruments, but for broad eval- 
uation of program goals , commonly recognized measuring devices are best. 

Hold in-service meetings with teachers and other special education 
personnel to -ensure that everyone understands how to administer the 
instruments selected in Step 7TT. Purposes of {Jrdgram evaluation system 
are also explained in detail to the staff. (Lack of communication be- 
tween administrators and teachers is a primary/ source of in-process 
fajlure of many attempted programs.) 

7 ' ' ' 

Teachers (and, to a lesser extent, other more specialized personnel that 
may be required for the more "exotic 11 tests In the "battery chosen in 



Step TIT ) administer all tests at start of year over as short a 
period of time as possible. Control of. class atmosphere., and to a 
greater extent, presence of teacher aides, will be a major enabling 
vehicle here, it is also implied in this step that the same tests 
will be given at "the end of the year (starting early enough before, 
the end of the school year to allow sufficient time for everyone to 
be«evaluated).-*This step, of course, with "pre- and post-measures on 
every child, is the heart of the MINIMUM DATA-BANKING ACTIVITIES 
REQUIRED IN A P. E. DESIGN. 

Thus far, Steps I through V have enabled the P.E. design to provide 
only raw, uninterpreted data itself.' For interpretation of this data, 
additional machinery is required. Minimal data processing facilities 
should be available W they are in'a large, number of I.Vs). A 
standard format-for punching data on all children on all measures onto 
computer card, should, be arranged.' Regardless of whether or not Vuch 
data will be analyzed^ sophisticated statistical ways, such computer- . 
ized data will aV least" yield printouts of hVeach.cMlcT In the ... 
program has progressed throughout the year. . Even fn such minimal ^ ^ 
printouts of. data-banked' information, a foundation for decision-making' 
is achieved. NOTE: Duplicate computer cards of every child will be. [ • 
fed back to the Bureau of Special Education in riajrlsburg. .If all on- 
going-programs, would participate, the state would finally be able to 
maintain a very Current picture (or "account") of what is happening 
throughout the Commonwealth., 

For purposes of gaining some types of rough standards -of how much 
progress can be expected of children with a given degree of potential . 
(or, by the.same concept, a given degree of disability in a certain area) 

e 
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all children should be coded on their -respective data-bank com- 
*• puter car<s, with: (a) the degree of potential (dividing the children 
into three or four groups on the I.Q. continuum), and/oMb) "the 
degree of disability on certain selected^hafacTeristics obtained on 
• the child's. latest psychojojica^-evTluatlon (i.e., data independent 
of the pre- and post-measures obtained' in the'P.E. design). The ad- 
vantage of coding the children into certain meaningful classification 
Is that finally program administrators Will be able to pay to school 
bdards, teachers, parents, and other groups how much progress can be 
expected usually wltb^a chMd of given disability and/or potential'. 
No one is yet able to provide such answers. This is one of the basic 
: . purposes of a data bank. NOTE: We are not t^g^Tes tab 1 ish norms, 
for, say the moderately retarded, .i.e., with such norms, if one carried 
the norm- referenced idea to completion/ he could say, In aU sincerety, 
that-* child who Initially tested as a moderately retarded youngster In 
'hi* first psychological evaluation and who now Is not demonstrating 
academic performance in line wi-th expectation for such a child, is 
• abnormally abnormal"!, The data-banking. Idea In special education Is 
' - primarily meant'to yield information with which to judge the overall 
success of programs. 

\ffiT Arroptiona! step with regard to the minimum data-banking machinery 
described in Steps T through VTT is to take the data-bank informa- 
tion and place it within a research design framework with the hope 
of comparing one programming technique with another. Up to the current 
Stop, ?t haj> been assumed that all children of a certain character- 
ization are undergoing the same type (speaking "in general program- 
philosophic terms) of programming approach. However, in this step, '. 
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It is recognized that some special education organizations (school 
systems, fV's^etc.) might wish to compare different programming 
techniques on the same general types of children. Such comparisons 
can be accomplished in this step if careful .records are kept and 
children are^assigned to different - techniques in accord with good re- 
search design. 



